透過您的圖書館登入
IP:3.21.104.72
  • 學位論文

基於小波域特徵選擇之語者辨識

Speaker Recognition Based on Feature Selection in Wavelet Domain

指導教授 : 陳文雄
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


語者辨識系統可以分三大部分:語音前處理、特徵萃取、分類辨識。特徵擷取的部分,我們使用傳統梅爾倒頻率參數(MFCC)與小波特徵;分類辨識的部分,利用高斯混合模型(GMM)的統計特性,測試語句代入混合模型計算出相似度。我們運用兩種特徵各自代入先前訓練好的模型,在相似度的分數上做結合,發現結合之後的辨識率比傳統MFCC來的好。我們還運用F-Ratio不同的角度來選擇鑑別性較高的特徵,減少特徵向量的維度,找出小波域中具有參考價值的特徵參數。 本實驗採用AURORA 2.0語音資料庫,其中包含男生52個類別,女生57個類別,每個人都有77句長短不一的數字串。在語者確認的實驗中,高斯混合數為32、MFCC維度為12、小波特徵參數維度為15的條件下,利用特徵選擇的方法降低小波特徵參數的維度,錯誤率從1.74%降到1.53%。在特徵參數的維度方面,系統效能在MFCC與小波特徵參數維度相近時有較佳表現。

並列摘要


Speaker recognition can be divided into three part:pre-processing, feature extraction, and pattern recognition. In feature extraction, we use the traditional features, Mel-frequency cepstral coefficients(MFCC), and wavelet coefficients. By using Gaussian mixture model(GMM)to calculate the similarity, we combine the output scores with MFCC and wavelet features. According to the experiments results, we find that the combining performance is better than we only use the traditional features, MFCC. In this paper, we focus on the feature selection in wavelet domain. We try to calculate the traditional F-ratio in different ways in order to find out the better features in wavelet domain. Our speaker recognition is performed on the AURORA 2.0 database which contains 52-males and 57-females and the content are 77 different length clean digital series. In the speaker verification experiments, under the conditions of 32-GMM、12 dimensions of MFCC、15 dimensions of wavelet coefficients, we reduce the dimensions of the wavelet coefficients by feature selection. We find that the equal error rate decrease from 1.73% to 1.53%. In the dimensions of features, the system performance is better when the dimensions of wavelet coefficients is close to the dimensions of MFCC.

並列關鍵字

F-Ratio MFCC Wavelet GMM

參考文獻


[1] J. Markel, B. Oshika and A. Gray, “Long-term feature averaging for speaker recognition,” IEEE Trans. Acoustics, Speech and Signal Processing, vol. 25, no. 4, pp. 330-337, 1977.
[2] L. Rudasi and S. A. Zahorian, “Text-independent talker identification with neural networks,” Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 389-392, 1991.
[3] S. Furui, “Vector-quantization-based speech recognition and speaker recognition techniques,’’ Conference Record of the Twenty-Fifth Asilomar Conference on Signals, Systems and Computers, vol. 2, pp. 954-958, 1991.
[4] A. Vasuki and P. T. Vanathi, “A review of vector quantization techniques,’’ IEEE Potentials Mag., vol. 25, no. 4, pp. 39-47, 2006.
[5] G. Zhou and W. B. Mikhael, “Speaker identification based on adaptive discriminative vector quantization,’’ IEE Proceedings Vision, Image and Signal Processing, vol. 153, no. 6, pp. 754-760, 2006.

延伸閱讀