Manifolds Based Emotion Recognition in Speech

The paper presents an emotional speech recognition system with the analysis of manifolds of speech. Working with large volumes of high-dimensional acoustic features, the researchers confront the problem of dimensionality reduction. Unlike classical techniques, such as Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA), a new approach, named Enhanced Lipschitz Embedding (ELE) is proposed in the paper to discover the nonlinear degrees of freedom that underlie the emotional speech corpus. ELE adopts geodesic distance to preserve the intrinsic geometry at all scales of speech corpus. Based on geodesic distance estimation, ELE embeds the 64-dimensional acoustic features into a six-dimensional space in which speech data with the same emotional state are generally clustered around one plane and the data distribution feature is beneficial to emotion classification. The compressed testing data is classified into six emotional states (neutral, anger, fear, happiness, sadness and surprise) by a trained linear Support Vector Machine (SVM) system. Considering the perception constancy of humans, ELE is also investigated in terms of its ability to detect the intrinsic geometry of emotional speech corrupted by noise. The performance of the new approach is compared with the methods of feature selection by Sequential Forward Selection (SFS), PCA, LDA, Isomap and Locally Linear Embedding (LLE). Experimental results demonstrate that, compared with other methods, the proposed system gives 9%-26% relative improvement in speaker-independent emotion recognition and 5%-20% improvement in speaker-dependent recognition. Meanwhile, the proposed system shows robustness and an improvement of approximately 10% in emotion recognition accuracy when speech is corrupted by increasing noise.

並列關鍵字

Enhanced Lipschitz Embedding ELE ； Dimensionality Reduction ； Emotional Speech Analysis ； Emotion Recognition

參考文獻

Bourgain, J.(1985).On lipschitz embedding of finite metric spaces in hilbert space.(Israel J. Math).

Google Scholar

Chang, Y.,C. Hu,M. Turk(2004).Probabilistic expression analysis on manifolds.(In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition).

Google Scholar

Chuang, Z. J.,C. H. Wu(2004).Emotion recognition using acoustic features and textual content.In Proceedings of IEEE International Conference on Multimedia and Expo.(In Proceedings of IEEE International Conference on Multimedia and Expo).:

Google Scholar

Duchene, J.,S. Leclercq(1988).An optimal transformation for discriminant principal component analysis.(IEEE Trans. on Pattern Analysis and Machine Intelligence).

Google Scholar

Go, H.,K. Kwak,D. Lee,M. Chun(2003).Emotion recognition from the facial image and speech signal.In proceedings of SICE 2003 Annual Conference.(In proceedings of SICE 2003 Annual Conference).:

Google Scholar

國際替代計量

Manifolds Based Emotion Recognition in Speech

全文下載

主題瀏覽