中文語音情緒轉折點之辨識與分析

語言是人與人之間溝通的橋樑，在人際關係與職場中都扮演著重要的角色。人類情緒表現伴隨著語音的產生，因此人類的語音不僅僅代表當下人們想表達的意思，它還帶有情緒的訊息。在本篇論文裡，我們嘗試去比較不同的連續中文語音的切割方式對於辨識率的影響，並嘗試在連續語音中，找到人類情緒轉折點。在實驗過程中，分別以均勻切割、端點偵測及完整句子這三種不同的切割方式進行連續中文語音之切割。情緒資料庫包含了5種情緒的語句，分別是生氣，快樂，悲傷，厭煩和一般。分類器則是使用Weighted Discrete-KNN (WDKNN)和K-Nearest Neighbor (KNN)兩種，並各自搭配最佳化特徵組合來進行情緒辨識，經實驗得出WD-KNN可得到較佳的分類結果，平均辨識率為 73%。

關鍵字

權重離散第k位最接近的鄰居；端點偵測；均勻切割；完整句子；離散第k位最接近的鄰居

並列摘要

Language is the bridge of communication between human beings, and the speech presenting language always plays an important role in the personal relationship and association. The human emotion usually accompanies with speech, so human speech identification is not only involving language syntax and meaning but also the emotion at that moment. In this thesis, we try to compare the long sentence speech corpus emotion recognition rate using various speech segmentation approaches, and try to detect the emotion transition point from the continuous speech. In the experiments, we apply three different speech segmentation methods to the continuous Mandarin emotional speech. These methods include uniform, endpoint detection, and whole sentence segmentation. There are five emotions being investigated, including anger, happiness, sadness, boredom, and neutral. We then employ two classification algorithms in the recognition phase, including the weighted discrete-KNN (WD-KNN) and conventional K-nearest neighbor (KNN). From to the experimental result, we find that the WD-KNN can yield better recognition result than the other. The average recognition accuracy is 73% for the testing sentences.

並列關鍵字

endpoint detection ； uniform ； whole sentence ； discrete-KNN ； weighted discrete-KNN

參考文獻

[1] S. Furui, Digital Speech Processing, Synthesis, and Recognition, Marcel Dekker Inc, Feb. 1989

[2] N. Sebe, I. Cohen, T. Gevers, and T.S. Huang, “ Multimodal Approaches for Emotion Recognition: A Survey,” Proc. SPIE, vol. 5670, pp. 56-67, Jan. 2005

[4] Encyclopedia Britannica Online, Retrieved 2009-04-15, http://www.britannica.com/

[5] D. Morrison, R. Wang, L.C. De Silva, and W.L. Xu, “Real-time Spoken Affect Classification and its Application in Call-Centers,” Proc. the Third International Conference on Information Technology and Applications, vol. 1, pp. 483-487, Jul. 2005

[6] L. Vidrascu and L. Devillers, “Annotation and Detection of Blended Emotions in Real Human-Human Dialogs Recorded in a Call Center,” IEEE International Conference on Multimedia and Expo, pp. 944-947, Jul. 2005

國際替代計量

中文語音情緒轉折點之辨識與分析

未授權

主題瀏覽