Wavelet Packet Entropy in Speaker-Independent Emotional State Detection from Speech Signal
Subject Areas : Signal and systems processingMina Kadkhodaei Elyaderani 1 , Hamid Mahmoodian 2 , Ghazaal Sheikhi 3
1 - Bonyan Institute of Higher Education Shahinshahr, Isfahan, Iran
2 - ٍElectrical Engineering Faculty, Najafabad Branch, Islamic azad University, Najafabad, Iran
3 - Phd Student, Computer Engineering Department, Eastern Mediterranean University, Turkey
Keywords: Support vector machine, wavelet Packet, Speech emotion recognition, shannon entropy coefficients,
Abstract :
In this paper, wavelet packet entropy is proposed for speaker-independent emotion detection from speech. After pre-processing, wavelet packet decomposition using wavelet type db3 at level 4 is calculated and Shannon entropy in its nodes is calculated to be used as feature. In addition, prosodic features such as first four formants, jitter or pitch deviation amplitude, and shimmer or energy variation amplitude besides MFCC features are applied to complete the feature vector. Then, Support Vector Machine (SVM) is used to classify the vectors in multi-class (all emotions) or two-class (each emotion versus normal state) format. 46 different utterances of a single sentence from Berlin Emotional Speech Dataset are selected. These are uttered by 10 speakers in sadness, happiness, fear, boredom, anger, and normal emotional state. Experimental results show that proposed features can improve emotional state detection accuracy in multi-class situation. Furthermore, adding to other features wavelet entropy coefficients increase the accuracy of two-class detection for anger, fear, and happiness.
[1] M. Ayadi, M. Kamel, "Servey on speech emotion recognition: Features, classification schemes, and databases", Pattern Recognition, Vol. 44, pp. 72-587, 2011.
[2] D. Ververidis, C. Kotropoulos, "Emotional speech recognition: Resources, features, and methods", Speech Communication, Vol. 48, pp. 1162–1181, 2006.
[3] B. Schuller, G. Rigoll, M. Long, "Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine –belief network architecture", Proceedings of the IEEE/ICASSP, Vol. 1, pp. 577–580, May 2004.
[4] France,et. al., "Acoustical properties of speech as indicators of depression and suicidal risk", IEEE Trans. on Biomedical Engineering, Vol. 47, No. 7, pp. 829-837, 2000.
[5] T. Pao, C. Wang, "A study on the search of the most discriminative speech features in the speaker dependent speech emotion recognition", Proceeding of the IEEE/PAAP, pp. 157-162, 2012.
[6] C. Busso, S. Lee, S. Narayanan, "Analysis of emotionally salient aspects of fundamental frequency for emotion detection", IEEE Trans. on Audio Speech Language Process, Vol. 17, pp. 582–596, 2009.
[7] B. Schuller, et. al.,"The relevance of feature type for the automatic classification of emotional user states: Low level descriptors and functionals", Proc. Inter speech, pp. 2253–2256, 2007.
[8] B. Vlasenkoet. al, "Combiningframe and turn-level information for robust recognition of emotions within speech", Proc. Interspeech, pp. 2225–2228, 2007.
[9] X. Mao, L. Chenand L. Fu, "Multi-level speech emotion recognition based on HMM and ANN", Proceeding of the World Cong. on Computer Science and Information Engineering, 2009.
[10] T.Polzehl, et. al, "Anger recognition in speech using acoustic and linguistic cues", Speech Communication, Vol. 53, pp. 1198–1209, 2011
[11] H.Marvi, Z.Esmaileyan, A.Harimi, "Estimation of LPC coefficients using evolutionary algorithms", Journal of AI and Data mining, Vol. 1, pp. 111–118, 2013.
[12] L.S. Chen,et. al, "Emotion recognition from audiovisual information", Proceeding of the IEEE/MMSP, pp. 83–88, Redondo Beach, CA, Dec. 1998.
[13] X.Li, "Speech feature toolbox design and emotional speech feature extraction", Thesis Submitted to the Faculty of Graduate School, Marquette University, In Partial Fulfillment of the Requirements for the Degree of Master of Science
[14] Y.Pan, P. Shen, L.Shen, "Feature extraction and selection in speech emotion recognition", Proceeding of the onlinepresent.org, Vol. 2, pp. 64 -69, 2012.
[15] M. Gaurav, "Performance analyses of spectral and prosodic features and their fusion for emotion recognition in speech", Proceeding of the IEEE/SLT, pp. 313-316, Goa, Dec. 2008.
[16] T.Athanaselist, S.Bakamidis, "ASR for emotional speech: clarifying the issues and enhancing performance", Journal of Neural Network, Vol. 18, pp. 437-444, 2005.
[17] K.Daqrouq, "Wavelet entropy and neural network for text-independent speaker identification", Journal Engineering Applications of Artificial Intelligence, Vol. 24, No. 5, pp. 796–802, 2011.
[18] Y.Pan, P. Shen, L.Shen, "Speech Emotion Recognition using support vector machine", International Journal of Smart Home, Vol. 6, No. 2, pp. 101 -108, 2012.
[19] A.Statinkov, et. al, "A Gentle introduction to support vector machines in biomedicine", world scientific.2011
[20] A. Cherif, L. Bouafif, T. Dabbabi, "Pitch detection and formants analysis of Arabic speech processing", Applied Acoustics, Vol. 62, No. 10, pp. 1129–1140, 2001.
[21] J. Clark, C. Yallop, J. Fletcher, "An introduction to phonetics and phonology", 3rded.malden MA, USA: Blackwell publishers.
[22] M. Kadkhodaee, G.H. Sheikhi, H. Mahmoodian, "Survey on time–frequency features for speaker emotion recognition in persian", National Conference Shushtar, 2014.
[23] I. Elamvazuthi, G. Ling, K. Nurhanim, P. Vasant, S. Parasuraman, " Surface electromyography feature extraction based on daubechies wavelets", Proceeding of the ICIEA, pp. 1492–1495, 2013.
[24] S.Ntalampiras, N.Fakotakis, "Modeling the temporal evolution of acoustic parameters for speech emotion recognition", IEEE Trans. on Affective Computing, Vol. 3, No. 1, pp. 116–125, 2012.
_||_