透過您的圖書館登入
IP:18.220.137.164
  • 學位論文

基於雜訊環境參考模型內插與子空間雜訊變異量消去法之強健性語音辨認

Reference Model Weighting and Noise Variability Subspace Projection for Robust Speech Recognition

指導教授 : 廖元甫

摘要


本論文在探討雜訊環境不匹配的情況下,語音辨認系統如何使用先驗知識來做雜訊環境模型的補償與參數的正規化,我們分別提出基於雜訊環境參考模型內插法與子空間雜訊變異量消去方法。 第一種方法是先在訓練時收集多個已知雜訊環境的MLLR轉移矩陣,用來代表可能的雜訊環境空間。在測試時分別以best first、a posteriori、ML的方式估測最佳權重來內插,此外還有EMLLR的方法,據以合成出適合測試環境的辨認模型。 再來第二種方法是在訓練時統計辨認單元的參數並求出超級向量,再將超級向量進行主成分分析建構特徵空間,於空間中消去雜訊環境的干擾分量,並以多層感知的訓練方法建立檢測器。在測試時利用測試語料的word graph求得超級向量,並消去雜訊環境的干擾分量,最後將檢測器與傳統聲學模型辨認出的分數做加權組合,再依調整後的分數找出最佳的辨認路徑。 實驗使用Aurora2語料庫,在複合情境的訓練模式下,與HEQ、ETSI Adv. frontend及MVA相比較。基於雜訊環境參考模型內插法將總平均辨識率提升到93.20%,而子空間雜訊變異量消去方法則是將總平均辨識率提升到92.51%。

並列摘要


In this study we propose two methods to compensate the noisy environment mismatch, include (a) reference model weighting and (b)noise variability subspace projection. The first method uses collected noisy environment characteristics and only one input test utterance to estimate the optimal weight sequence and then synthesizes the characteristic of the unknown test noisy environment by interpolating. The second method subtracts the noise variability on the eigen-spcae and builds word-based detectors for rescoring in automatic speech recognition. The proposed methods were evaluated on the multi-condition training task of Aurora2 corpus. Experimental results showed that the average recognition rate compared with MVA, HEQ and ETSI Adv. frontend is improved to achieve 93.20% by reference model weighting and 92.51% by noise variability subspace projection.

參考文獻


[3]. A. de la Torre, J. C. Segura, M. C. Benitez, A. M. Peinado and A. J. Rubio, “Non-linear transformation of the feature space for robust speech recognition,” ICASSP, vol. I, pp.401-404, 2002.
[6]. R.O. Duda, P.E. Hart, “Pattern Classification and Scene Analysis,” John Wiley and Sons, New York, 1973.
[7]. N. Kumar, “Investigation of Silicon-Auditory Models and Generalization of Linear Discriminant Analysis for Improved Speech Recognition,” Ph.D. thesis, John Hopkins University, Baltimore, 1997.
[8]. M.J.F. Gales, “Maximum Likelihood Multiple Subspace Projections for Hidden Markov Models,” IEEE Transactions on Speech and Audio Processing, vol. 10, no. 2, pp. 37-47, 2002.
[9]. M. Gales and S. Young, “Robust continuous speech recognition using parallel model combination,” IEEE Transactions on Speech and Audio Proc., vol. 13, no. 3, September 1996.

延伸閱讀