語音情緒辨識最佳參數之研究

人類的情緒表達方式非常多樣化，可以通過言語、姿勢或是文字等管道抒發個人的情感。人與人溝通時，語音中伴隨的情緒有時也會影響到語句所要表達的意思。因此，情緒在人類的語音中扮演著相當重要的角色，語音情緒辨識也越來越受到重視。語音情緒辨識中，系統使用之分類器和語音特徵，會造成辨識率的差異。本論文針對語音情緒辨識最佳參數進行研究，分為下列三項研究重點：1）依不同的分類器；2）依欲分辨的情緒類別; 3）針對特徵參數作分析。本研究希望檢視針對這些方面執行特徵選擇後，歸納出最適合這些語音環境之特徵參數，進而提升辨識率並縮短分類時間。其中，特徵參數包含基頻(Formant)、振福擾動係數(Shimmer)、頻率擾動係數(Jitter)、線性預測參數(LPC)、線性預測倒頻譜係數(LPCC)、梅爾頻率倒頻譜係數(MFCC)、指數頻率強度參數(LFPC)、感知線性預測參數(PLP)、相對頻譜感知線性預測參數(RastaPLP)、對數能量(Log-Energy)以及過零率(ZCR)，以及這些特徵參數之平均值、標準差、最小值、最大值與範圍，共78個特徵參數。本研究使用的特徵選擇法為循序的前序選擇法，利用這種方式找出最具影響力的特徵值組合。實驗結果顯示，針對五種情緒，利用我們所使用的語料庫執行特徵選擇後，運用權重式離散型最近鄰居分類法之辨識率最高能夠達到90%。此外，在所有特徵參數中最重要的特徵參數為線性預測參數(LPC)，它出現在幾乎所有的最具影響力的特徵參數集合中。

關鍵字

循序的前序選擇法；權重式離散型最近鄰居分群演算法；特徵選擇

並列摘要

There are many ways for humans to express their emotion, for instance, speech, attitude or writing. Human speech involves not only the syntax but also the feeling at the moment of speaking. Thus, emotions play an important role for speech communication and recognizing human emotion in speech signal has attracted quite a lot of attention. In emotion recognition, different classifiers and features used in the system will influence the recognition rate. The purpose of this study is to acquire the most effective feature set for a specific classifier used in the speech emotion recognition. There are three main focuses: classifiers, emotion corpus combinations and the features to be analyzed. In this thesis we use 78 speech features, including Formant, Shimmer, Jitter, Linear Predictive Coefficients (LPC), Linear Prediction Cepstral Coefficients (LPCC), Mel-Frequency Cepstral Coefficients (MFCC), first derivative of MFCC (D-MFCC), second derivative of MFCC (DD-MFCC), Log Frequency Power Coefficients (LFPC), Perceptual Linear Prediction (PLP), RelAtive SpecTrAl PLP (RastaPLP), Log-Energy, Zero Crossing Rate (ZCR), as well as their mean, standard deviation, minimum, maximum and range, are extracted. The method that we analyze the effects of features is called sequential forward selection (SFS). Experiment results indicate that the most effective feature set for five emotions using WD-KNN can obtain the highest recognition accuracy of 90% with 13 features. From the results, we can see that the most effective feature among all extracted features for emotion recognition is Linear Predictive Coefficients (LPC). It appears in the most effective features for all the classifiers tested.

並列關鍵字

WD-KNN ； feature selection ； SFS

參考文獻

[1] W. Yu, “Research and Implementation of Emotional feature Classification and Recognition in Speech Signal,” Intelligent Information Technology Application Workshops, pp. 471-474, Dec. 2008

[4] D. Morrison, R. Wang, L.C. De Silva, and W.L. Xu, “Real-time Spoken Affect Classification and its Application in Call-Centers,” Proceedings of the Third International Conference on Information Technology and Applications, vol. 1, pp. 483-487, Jul. 2005

[5] L. Vidrascu and L. Devillers, “Annotation and Detection of Blended Emotions in Real Human-Human Dialogs Recorded in a Call Center,” IEEE International Conference on Multimedia and Expo, pp. 944-947, Jul. 2005

[7] S. Zhang and Z. Zhao, “Feature Selection Filtering Methods for Emotion Recognition in Chinese Speech Signal,” International Conference on Signal Processing, pp. 1699-1702, Oct. 2008

[8] D. Canamero and J. Fredslund, “I Show You How I Like You: Human-Robot Interaction through Emotional Expression and Tactile Stimulation,” http://www.daimi.au.dk/~chili/feelix/feelix.html, Retrieved May 30, 2009

國際替代計量

語音情緒辨識最佳參數之研究

未授權

主題瀏覽