語言的溝通在人與人的交流中扮演著一個很重要的角色,人類的語音中不僅僅包含著人們所要表達的意思,還包含了這個人在當下的一種情緒表現,在這篇論文中我們保留語音辨識中常見的特徵參數做了深入探討,這些參數包含基頻、語音的抖動、線性預測參數、線性預測倒頻譜係數、梅爾頻率倒頻譜係數、指數頻率強度參數以及感知的線性預測參數,我們希望能夠在這些特徵的數值中找出一些訊息,而我們所使用的分析方法包含循序的前序選擇法以及循序的後序選擇法,另外再加上特徵權重的方法於K最近相鄰分類法(KNN),找出一組較好的特徵參數組合,在這種分類法下找出了32個具有最好的辨識效果特徵值組合,運用這些參數及分類法對我們所使用的資料庫有84%的辨識率,最後我們比較SVM的辨識率以及使用特徵權重後的KNN跟WDKNN的辨識率並且將這組32個的特徵參數組合套用在連續語音情緒辨識的系統中。
Speech communication plays an important role for human beings. Human speech is not only involving the syntax but also the feeling at the moment. In this thesis we use 11 kinds of speech features, including formant, shimmer, jitter, Linear Predictive Coefficients (LPC), Linear Prediction Cepstral Coefficients (LPCC), Mel-Frequency Cepstral Coefficients (MFCC), first derivative of MFCC (D-MFCC), second derivative of MFCC (DD-MFCC), Log Frequency Power Coefficients (LFPC), Perceptual Linear Prediction (PLP) and RelAtive SpecTrAl PLP (RastaPLP) as the features for emotion classification. These features are usually used in the speech recognition. We try to find the relation between emotion and these features. The methods that we analyze the features are called sequential forward selection (SFS) and sequential backward selection (SBS). Under the KNN classifier, 32 features was chosen, and we get a recognition rate of 84% using our emotion corpus database. We also use the weighted KNN and WDKNN classification method to classify the emotion in the speech. We compare the performance of SVM with respect to weighted KNN and WDKNN. These 32 features are the most appropriate features in the emotion recognition and are used in the continuous speech emotion recognition system.