透過您的圖書館登入
IP:3.16.29.209
  • 學位論文

應用於語者確認之支撐向量機參數最佳化研究

A Study on SVM Parameter Optimization for Speaker Verification

指導教授 : 丁英智
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


本論文提出了應用於支撐向量機(Support Vector Machine, SVM)參數最佳化之多模型混合語者確認技術,本論文主要的研究方向是結合高斯混合模型(Gaussian Mixture Model, GMM)、動態時間校正技術(Dynamics Time Warping, DTW)、支撐向量機模型與模糊模型(Fuzzy Model)等以對傳統的單一語音模型方式之語者確認系統做進一步的辨識性能改良。 本論文在進行運用多模方式以對SVM參數最佳化研究之前先針對多模型混合辨識系統進行研究。在此部份的研究中,本論文設計了一種兼具語音辨識與語者辨識之多模型混合辨識系統,此系統之前端為語者確認部分,此部分採用了平行式架構而同時融合高斯混合模型及支撐向量機模型,並以投票式SVMGMM演算法進行語者確認之決策判斷,此系統之後端則為語音辨識部分,此部份採用動態時間校正辨識技術。所發展之多模型混合辨識方法經由三類語音資料庫實驗測試後證實其效能確實有效,前端語者確認之性能優於傳統式之單一高斯混合模型或單一支撐向量機模型,其具備73.37%的識別率,在後端語音辨識部份,由於前端語者確認已經剔除不合適資料,因而後端DTW辨識亦能達73.70%的高度識別率。 在多模混合應用於SVM參數最佳化研究方面,本論文提出三種SVM參數最佳化調整方式,此三種參數調整方式分別為運用GMM語者辨識調整SVM參數γ、運用DTW語音辨識調整SVM參數γ與運用DTW語音辨識調整SVM參數C等。此三種所發展之強化SVM模型之參數調整方式皆運用模糊模型技術以進行參數調校,在此部份研究中,本論文提出Fuzzy GMM-regulated γ、Fuzzy DTW-regulated γ及Fuzzy DTW-regulated C等三種方法。就Fuzzy GMM-regulated γ方法而言,該方法藉由模糊控制機制之依據合法語者與非法語者之兩類高斯混合模型之模型平均向量差異來調整參數γ,並進而控制SVM hyperplane的邊界大小而提昇SVM分類器的辨識準確度。實驗結果可知在經由Fuzzy GMM-regulated γ調校過後的支撐向量機分類器有著89.20%的優異辨識性能;在Fuzzy DTW-regulated γ的研究中,Fuzzy DTW-regulated γ藉由模糊控制機制依據合法語者與非法語者之兩類動態時間校正的距離值差異來調整參數γ並進而糾正SVM hyperplane的邊界大小而能提高SVM分類器的辨識準確度。經由Fuzzy DTW-regulated γ調校過後的支撐向量機分類器有著88.89%的辨識率;Fuzzy DTW-regulated C方法則是藉由模糊控制機制之依據合法語者與非法語者之兩類動態時間校正的距離值差異來調整SVM的參數C量值而能估算出SVM hyperplane之合適邊界大小,此方式將提昇SVM分類器的辨識準確度,經Fuzzy DTW-regulated C調校過後的支撐向量機分類器有著84.27%的辨識率。此部份之SVM參數最佳化研究中所提出之應用於語者確認的Fuzzy GMM-regulated γ、Fuzzy DTW-regulated γ及Fuzzy DTW-regulated C等三種方法確實可較傳統之任意給定參數γ或參數C的SVM語者確認方法具備更優異之辨識準確度。

並列摘要


In this paper, we present a new technology framework of speaker verification, which is support vector machine (SVM) parameter optimization for speaker verification. The main purpose of this framework is to combine the GMM model, DTW technique, SVM model and the fuzzy model to enhance the conventional single SVM model speaker verification. We first precede the research of multi-model combination for speaker verification systems. As the multi-model combination system, we proposed a framework that combines speech recognition and speaker verification. As this framework, we present a parallel mode which combines the GMM model and the SVM model for speaker verification in the front side of multi-model combination system. Furthermore, we use the algorithm of voting SVMGMM when making a decision of speaker verification result. And the back side of the framework of multi-model combination system is the speech recognition system which takes use of the DTW technology. Experiments confirmed that the multi-model combination framework has the effective performance. The performance of front side of speaker verification is better than that of the traditional single Gaussian mixture model and that of the single support vector machine model. It has the accuracy of 73.37% recognition rate. As the performance of the back side of DTW speech recognition, since the forward speaker verification has removed the inappropriate testing data, the DTW speech recognition has a nice accuracy performance, which achieves 73.70%. In the study of multi-mode combination for SVM parameter optimization, we proposed three kinds of methods to optimize the parameters of SVM, which are GMM speaker verification to optimize SVM parameter γ, DTW speech recognition to optimize SVM parameter γ and DTW speech recognition to optimize SVM parameter C. Fuzzy modeling techniques are employed to these three SVM parameter optimization methods. First, the Fuzzy GMM-regulated γ is proposed. Fuzzy GMM-regulated γ inputs the difference of mean vectors into fuzzy controller to output the SVM parameterγ. The difference of mean vectors was calculated from the GMM of valid speakers and the GMM of invalid speakers. Furthermore, the Fuzzy GMM-regulated γ also controls the size of boundary of SVM hyperplane to enhance the verification accuracy of SVM classifier. The experiment shows that the proposed Fuzzy GMM-regulated γ has 84.26% accuracy. Second, the Fuzzy DTW-regulated γ method is proposed. Fuzzy DTW-regulated γ inputs the DTW distance into fuzzy controller and outputs the SVM parameterγ. The DTW distances are calculated from the valid speakers and invalid speakers by the DTW algorithm. Fuzzy DTW-regulated γ can also control the size of boundary of SVM hyperplane to raise the verification accuracy of SVM classifier. The experiment shows that the proposed Fuzzy DTW-regulated γ has 82.56% accuracy. Third, the Fuzzy DTW-regulated C method is proposed. Fuzzy DTW-regulated C inputs the DTW distance into fuzzy controller and outputs the SVM parameter C. The DTW distances are calculated from the valid speakers and invalid speakers by DTW algorithm. The proposed Fuzzy DTW-regulated C can also find a proper size of boundary of SVM hyperplane to raise the verification accuracy of SVM classifier. The experiment shows that Fuzzy DTW-regulated C has the accuracy rate of 79.93%. Experimental results on speaker verification confirmed that the verification accuracy of all three developed SVM parameter optimization methods is better than that of the traditional single SVM classifier.

參考文獻


[32] 林子正,2012,基於多模型架構之語者辨認系統,國立虎尾科技大學電機工程系碩士班碩士論文。
[1] G. R. Doddington, “Speaker recognition – identifying people by their voices,” in Proc. IEEE, Vol. 73,No. 11, Nov.1985, pp. 1651-1664.
[2] P. Day and A. K. Nandi, “Robust Text-Independent Speaker Verification Using Genetic Programming,” IEEE Trans. Audio, Speech, and Language Processing, Vol. 15, No. 1, pp. 285-295, Jan. 2007.
[5] J. T. Tou and R. C. Gonzalez, Pattern Recognition Principles, New York: Addison Wesley, 1974.
[6] N. Wang, P. C. Ching, N. Zheng and T. Lee, “Robust speaker recognition using denoised vocal source and vocal tract features,” IEEE Trans. Audio, Speech, and Language Processing, Vol. 19, No. 1, pp.196–205, Jan. 2011.

被引用紀錄


施家逸(2015)。結合KINECT麥克風陣列之語者定位的語音模樣辨識研究〔碩士論文,國立虎尾科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0028-1008201522590800

延伸閱讀