情緒辨識廣泛的被應用在很多領域中。正確的辨識情緒一直是這項研究的目的。我們認為不同階層的語音特徵可以提供不同的語音資訊,並且相信結合不同階層的語音特徵可以提高辨識率。從實驗結果可得知,將不同階層的語音特徵結合再一起,可以有效達到彌補各階層資訊不足的情況。我們提出幾種不同階層特徵之組合方式,並且在實驗中,我們證實了組合不同階層語音特徵確實可以提升辨識率。在實驗中我們採用德語情緒語料庫以及英語eNTERFACE情緒語料庫,前者有七種情緒類別,後者則有六種情緒類別。我們擷取的語音特徵值可分為兩種,第一類是以音框階層為基準擷取的語音特徵,包含能量、音高以及梅爾倒頻譜係數;第二類則是針對區段階層以及語句階層擷取,擷取的特徵則為low-level-descriptors (LLDs)。由實驗可知,相較於單一階層的語音特徵,結合多層特徵將能有效提升辨識率。
Emotion recognition has been successfully applied in many fields. It is believed that features extracted from each timing-level can provide different information of the emotional speech signals and therefore can compensate one another. In order to achieve a promising recognition accuracy, several methods for combining features extracted from different timing-levels are proposed in this thesis, including likelihood combination, weighted likelihood combination, raw feature combination and partial raw feature combination. We extracted spectrum features and prosodic features for frame-level features, and low-level descriptors (LLDs) for segment-level features and utterance-level features. The Berlin Emotion Database and eNTERFACE emotional database are used in the experiments. Compared with conventional one or two timing-level features, the combination of three timing-level features shows higher recognition rate.