透過您的圖書館登入
IP:3.144.230.82
  • 學位論文

基於雜訊環境與語者特徵參考模型內插之強健性語音辨認

Reference Eigen-Environment and Speaker Weighting for Robust Speech Recognition

指導教授 : 廖元甫

摘要


本論文在探討訓練和測試語料不匹配的情況下,語音辨認系統如何使用先驗知識來做雜訊環境和語者特徵的模型之補償與參數的正規化,我們結合RMW和EMLLR各自的優點提出利用雜訊環境和語者特徵參考模型內插法強健性語音辨認。 根據先驗知識的幾個干擾因素,訓練時收集多個已知雜訊環境和語者特徵的特徵函數最大相似度線性迴歸(EMLLR)轉移矩陣特徵空間,各自經過PCA分析得到特徵矩陣並串成超級向量。依據每組情況不同,則各別取出前M個做為基底(M

並列摘要


In this study a reference eigen-environment and speaker weighting (RESW) method is proposed for online HMM adaptation. RESW establishes multiple eigen-MLLR subspaces as the set of a priori knowledge according to certain affecting factors, such as noise type, SNR, male and female. It then projects an input test utterance simultaneously into the set of eigen-subspaces and optimally synthesizes out a set of suitable HMMs. The proposed RESW was evaluated on Aurora 2 multi-condition training task. Experimental results showed that average word error rate (WER) of 6.12% was achieved. Moreover, RESW not only outperformed the multi-condition training baseline (Multi-Con., 13.72%) but also the blind ETSI advanced DSR front-end (ETSI-Adv., 8.65%) and the histogram equalization (HEQ, 8.66%) and the non-blind reference model weighting (RMW, 7.29%) and Eigen-MLLR (6.14%) approaches.

參考文獻


[1]. J. W. Hung, H. M. Wang and L. S. Lee, “Comparative analysis for data-driven temporal filters obtained via principal component analysis (PCA) and linear discriminant analysis (LDA) in speech recognition,” Eurospeech, Denmark, September 2001.
[2]. N. C. Wang, J. W. Hung and L. S. Lee, “Data-driven temporal filters based on multi-eigenvectors for robust features in speech recognition,” ICASSP, Hong Kong, pp.400-403, 2003.
[5]. A. de la Torre, J. C. Segura, M. C. Benitez, A. M. Peinado and A. J. Rubio, “Non-linear transformation of the feature space for robust speech recognition,” ICASSP, vol. I, pp.401-404, 2002.
[8]. R.O. Duda, P.E. Hart, “Pattern Classification and Scene Analysis,” John Wiley and Sons, New York, 1973.
[9]. N. Kumar, “Investigation of Silicon-Auditory Models and Generalization of Linear Discriminant Analysis for Improved Speech Recognition,” Ph.D. thesis, John Hopkins University, Baltimore, 1997.

延伸閱讀