多種語音結構於情緒辨識之初步研究

在本論文中，我們初步介紹了情緒辨識發展的背景、自動情緒辨識之系統架構，以及討論各種語音特徵在情緒辨識中的表現。我們發現，傳統普遍用於語音辨識之梅爾倒頻譜特徵(MFCC)，相對於對數頻率功率係數(LFPC)而言，得到的情緒辨識率較差，但當我們把LFPC作離散餘弦轉換，得到對數頻率倒頻譜特徵(LFCC)時，發現LFCC辨識結果優於LFPC，此結果是經由國際知名與通用的情緒語音資料庫 Emotional Prosody Speech and Transcripts所實驗而得，因此極具可信度。我們因而驗證了此符合了語音辨識一般的共識：倒頻譜特徵比對數頻譜特徵在辨識上的表現較佳更具語音鑑別力、無論於語音內容辨識與語音情緒辨識皆是如此。

關鍵字

語音情緒辨識；對數頻譜；倒頻譜。

並列摘要

In this thesis, we briefly introduce several aspects of emotion recognition, including the corresponding background, structure of systems as well as several feature representations. Among the various feature representations, the logarithmic frequency power coefficients (LFPC) behave better than the Mel-frequency cepstral coefficients (MFCC) that are broadly applied in speech recognition. This thesis proposes to further process the LFPC features via a discrete cosine transform (DCT) to reduce the mutual dependence of LPFC features and emphasize the vocal tract information in the speech sound. The resulting new features are named as logarithmic frequency cepstral coefficients (LFCC). The experiments conducted on the well-known emotion recognition database, Emotional Prosody Speech and Transcripts, reveal that the presented LFCC show superior performance in emotion recognition than LFPC and MFCC.

並列關鍵字

emotion speech recognition ； logarithmic spectrum ； cepstrum

參考文獻

[1] K. R. Scherer, “What are emotions? and how can they be measured?, ” Social Science Information, 44(4), pp. 695-729, 2005.

Google Scholar

[2] R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz and J. G. Taylor, “Emotion recognition in human-computer interaction,” IEEE Signal Processing Magazine, 18(1), pp. 32-80, 2001.

Google Scholar

[3] V. Sethu, E. Ambikairajah and J. Epps, “Speaker normalisation for speech-based emotion detection,” in Proceedings of 15th International Conference on Digital Signal Processing, pp. 611-614, 2007.

Google Scholar

[4] Z. Inanoglu and R. Caneel, “Emotive alert: HMM-based emotion detection in voicemail messages,” in Proceedings of the 10th International Conference on Intelligent User Interfaces, pp. 251-253, 2005.

Google Scholar

[5] J. S. Park, J. H. Kim and Y. H. Oh, “Feature vector classification based speech emotion recognition for service robots,” IEEE Transactions on Consumer electronics, pp. 1590-1596, 2009.

Google Scholar

被引用紀錄

周學雯（2001）。大學生參與運動志工之動機與意願研究〔碩士論文，國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-2603200719121135

國際替代計量

多種語音結構於情緒辨識之初步研究

全文下載

主題瀏覽