連續語音之情緒轉折分析與偵測

語言的溝通在人與人的交流中扮演著一個很重要的角色，人類的語音中不僅僅包含著人們所要表達的意思，還包含了這個人在當下的一種情緒表現，在這篇論文中我們保留語音辨識中常見的特徵參數做了深入探討，這些參數包含基頻、語音的抖動、線性預測參數、線性預測倒頻譜係數、梅爾頻率倒頻譜係數、指數頻率強度參數以及感知的線性預測參數，我們希望能夠在這些特徵的數值中找出一些訊息，而我們所使用的分析方法包含循序的前序選擇法以及循序的後序選擇法，另外再加上特徵權重的方法於K最近相鄰分類法(KNN)，找出一組較好的特徵參數組合，在這種分類法下找出了32個具有最好的辨識效果特徵值組合，運用這些參數及分類法對我們所使用的資料庫有84%的辨識率，最後我們比較SVM的辨識率以及使用特徵權重後的KNN跟WDKNN的辨識率並且將這組32個的特徵參數組合套用在連續語音情緒辨識的系統中。

關鍵字

情緒辨識；連續語音

並列摘要

Speech communication plays an important role for human beings. Human speech is not only involving the syntax but also the feeling at the moment. In this thesis we use 11 kinds of speech features, including formant, shimmer, jitter, Linear Predictive Coefficients (LPC), Linear Prediction Cepstral Coefficients (LPCC), Mel-Frequency Cepstral Coefficients (MFCC), first derivative of MFCC (D-MFCC), second derivative of MFCC (DD-MFCC), Log Frequency Power Coefficients (LFPC), Perceptual Linear Prediction (PLP) and RelAtive SpecTrAl PLP (RastaPLP) as the features for emotion classification. These features are usually used in the speech recognition. We try to find the relation between emotion and these features. The methods that we analyze the features are called sequential forward selection (SFS) and sequential backward selection (SBS). Under the KNN classifier, 32 features was chosen, and we get a recognition rate of 84% using our emotion corpus database. We also use the weighted KNN and WDKNN classification method to classify the emotion in the speech. We compare the performance of SVM with respect to weighted KNN and WDKNN. These 32 features are the most appropriate features in the emotion recognition and are used in the continuous speech emotion recognition system.

並列關鍵字

emotion recognition ； continuous speech

參考文獻

[1] S. Furui, Digital Speech Processing, Synthesis, and Recognition, Marcel Dekker Inc, February 10, 1989

[2] N. Sebe, I. Cohen, T. Gevers, and T.S. Huang, “ Multimodal Approaches for Emotion Recognition: A Survey,” Proceedings of SPIE, Vol. 5670, pp. 56-67, Jan. 2005

[4] Encyclopedia Britannica Online, http://www.britannica.com/

[5] D. Morrison, R. Wang, L.C. De Silva, and W.L. Xu, “Real-time Spoken Affect Classification and its Application in Call-Centers,” Proceedings of the Third International Conference on Information Technology and Applications, Vol. 1, pp. 483-487, July 2005

[6] L. Vidrascu and L. Devillers, “Annotation and Detection of Blended Emotions in Real Human-Human Dialogs Recorded in a Call Center,” IEEE International Conference on Multimedia and Expo, pp. 944 – 947, July 2005

國際替代計量

連續語音之情緒轉折分析與偵測

未授權

主題瀏覽