透過您的圖書館登入
IP:18.188.175.182
  • 學位論文

整合高斯混合與具性能指標支撐向量機模型之語者確認研究

A Hybrid Model of GMM and SVM with Representative Labels for Speaker Verification

指導教授 : 莊堯棠
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


本論文主要針對語者確認系統上,提出新的辨識流程,使得系統效能得到提升,此架構包含了高斯混合模型和具性能指標支撐向量機模型的整合應用。   其中,具性能指標支撐向量機,主要是在原始特徵向量中,加入所定義的性能指標,使得向量維度增高,讓整個系統更具鑑別力。而在提出的系統架構中,測試句與所有註冊模型算分數,以決定類別標籤,依據Top1減Top2的分數,並觀察是否大於或等於臨界值,若大於或等於,則使用Top1的類別標籤,使測試句的特徵向量增維,並和含類別標籤的支撐向量機算距離值,反之,則進入原本傳統的語者確認系統。   從實驗結果顯示,在提出的架構中,高斯混合模型選定為128-mixture並定臨界值為0.3時,系統性能可達最好的相等錯誤率及決策成本函數為14.43%和0.1743,比起支撐向量機語者確認系統的效能17.86%和0.2175,改善了3.43%和0.0414,而比起傳統的語者確認系統的效能15.87%和0.1912,改善了1.44%和0.0169。

並列摘要


This thesis proposes a new recognition system to improve performance for speaker verification. The proposed system combines the Gaussian Mixture Model (GMM) and Support Vector Machine (SVM) with representative labels. The SVM with representative labels is built by adding the defined class labels to the original feature vectors to increase the dimension of feature vectors and make the system more discriminative. In the proposed system, each input segment is sent to compute the log-likelihood ratio with all the enrolled models to decide the class labels. Accordingly, if the difference of the scores between Top1 and Top2 is greater than a chosen threshold, the class labels for the top1 speaker will be added as extra features to the original feature vectors. Then the augmented feature vectors are applied to the SVM classifier. Otherwise, we verify the speaker using the GMM-UBM baseline system. The experimental result shows that with a 128-mixture GMM and a 0.3 threshold, the proposed system obtains a 3.43% EER and 4.14% DCF improvement over the SVM speaker verification system, and a 1.44% EER and 1.69% DCF improvement over the baseline system.

參考文獻


[2] X. Huang, A. Acero and H. W. Hon, Spoken Language Processing, Prentice Hall, 2001.
[4] L. S. Lee, Y. Lee, “Voice Access of Global Information for Broad-Band Wireless: Technologies of Today and Challenges of Tomorrow,” Proceedings of the IEEE, vol. 89, no. 1, pp. 41-57, January 2001.
[5] R. Vergin and D. O’Shaughnessy and A. Farhat, “Generalized Mel Frequency Coefficients for Large-Vocabulary Speaker-Independent Continuous-Speech Recognition,” IEEE Trans. Speech and Audio Processing, vol. 7, no. 5, pp. 525-532, September 1999.
[6] T. K. Moon, “The Expectation-Maximization Algorithm,” IEEE Signal Processing Magazine, vol. 13, no. 6, pp. 47-60, November 1996.
[7] D. A. Reynolds and R. C. Rose, “Robust Text-Independent Speaker Identification Using Gaussian Mixture Models,” IEEE Trans. Speech and Audio Processing, vol. 3, no. 1, pp. 72-83, January 1995.

被引用紀錄


吳晨瑋(2012)。生活聲響之自動辨認〔碩士論文,國立清華大學〕。華藝線上圖書館。https://doi.org/10.6843/NTHU.2012.00614

延伸閱讀