在本篇論文中,我們提出了一個改進語音情緒辨識率的系統架構,利用語者相 關(speaker dependent)或是語者無關(speaker independent)的調適語料,經由事後機率 最大化(MAP)調適法加入語者或情緒資訊,將原始的高斯混合模型(GMM)以及通用 背景模型(UBM)調適成足以更精確表達不同情緒狀態的模型,並使用梅爾頻率倒頻 譜係數(MFCC)以及感知線性估測倒頻譜係數(PLPCC)兩種常見語音特徵來驗證調 適之結果,當我們將所提之新系統架構運用在國際通用的情緒語音資料庫 Emotional Prosody Speech and Transcripts,所得之實驗結果顯示,經由事後機率最大 化調適法調適之語音情緒辨識率會明顯優於調適前的結果,此結果驗證了所提出之 新系統架構對情緒辨識的改進能力。 關
In this thesis, we present a system structure which can improve the accuracy of speech emotion recognition. Using maximum a posteriori (MAP) principle via speakerspecific or speaker-independent utterances to adapt the original Gaussian mixture model (GMM) and universal background model (UBM), the resulting new emotion model can express the information of emotions more accurately and thus achieve higher emotion recognition accuracy. Two types of speech features, Mel-frequency cepstral coefficient (MFCC) and perceptual linear predictive cepstral coefficient (PLPCC), are used to validate the efficacy of the presented adaptive structure. The experiments conducted on the well-known emotion database, Emotional Prosody Speech and Transcripts, reveal that further adaptation of the emotion models with the MAP principle can improve the recognition accuracy relative to the original GMMs without any adaptation. Key