語者辨識系統可以分三大部分:語音前處理、特徵萃取、分類辨識。特徵擷取的部分,我們使用傳統梅爾倒頻率參數(MFCC)與小波特徵;分類辨識的部分,利用高斯混合模型(GMM)的統計特性,測試語句代入混合模型計算出相似度。我們運用兩種特徵各自代入先前訓練好的模型,在相似度的分數上做結合,發現結合之後的辨識率比傳統MFCC來的好。我們還運用F-Ratio不同的角度來選擇鑑別性較高的特徵,減少特徵向量的維度,找出小波域中具有參考價值的特徵參數。 本實驗採用AURORA 2.0語音資料庫,其中包含男生52個類別,女生57個類別,每個人都有77句長短不一的數字串。在語者確認的實驗中,高斯混合數為32、MFCC維度為12、小波特徵參數維度為15的條件下,利用特徵選擇的方法降低小波特徵參數的維度,錯誤率從1.74%降到1.53%。在特徵參數的維度方面,系統效能在MFCC與小波特徵參數維度相近時有較佳表現。
Speaker recognition can be divided into three part:pre-processing, feature extraction, and pattern recognition. In feature extraction, we use the traditional features, Mel-frequency cepstral coefficients(MFCC), and wavelet coefficients. By using Gaussian mixture model(GMM)to calculate the similarity, we combine the output scores with MFCC and wavelet features. According to the experiments results, we find that the combining performance is better than we only use the traditional features, MFCC. In this paper, we focus on the feature selection in wavelet domain. We try to calculate the traditional F-ratio in different ways in order to find out the better features in wavelet domain. Our speaker recognition is performed on the AURORA 2.0 database which contains 52-males and 57-females and the content are 77 different length clean digital series. In the speaker verification experiments, under the conditions of 32-GMM、12 dimensions of MFCC、15 dimensions of wavelet coefficients, we reduce the dimensions of the wavelet coefficients by feature selection. We find that the equal error rate decrease from 1.73% to 1.53%. In the dimensions of features, the system performance is better when the dimensions of wavelet coefficients is close to the dimensions of MFCC.