分群式倒頻譜參數正規化與蒙地卡羅強健性語音模型估測

在分散式語音辨識系統架構中，通常會遇到環境不匹配的問題，要解決此問題勢必得提升系統之強健性。為了增加語音辨識系統之強健性，我們將於系統前端（Front-end）提出分群式倒頻譜參數正規化法，使用兩個高斯混合模型分別對語音與非語音作調適；而在後端（Back-end）則提出結合蒙地卡羅強健性語音模型估測與MVA的方法，在乾淨語音訓練模型中加入人造（Artificial）雜訊，期望能夠藉此方法在不需錄製含雜訊語音語料庫的前提下提升整個語音辨識系統之辨識率。我們使用Aurora 2語料庫來驗證上述我們所提出的方法：在前端處理部分，將分佈等化法中加入ARMA低通濾波器後可將原先分佈等化法之辨識率從80.78%提升至83.87%，而使用分群式倒頻譜參數正規化法則可進一步提升至84.05%；而在後端訓練模型處理部分，若結合蒙地卡羅強健性語音模型估測法與MVA，則可得到相當不錯的平均辨識率，為89.61%。

關鍵字

環境不匹配；參數正規化；模型估測

並列摘要

Environment mismatch is the major source of performance degradation in distribution speech recognition. To compensate the environment mismatch problems, two approaches including (1) two-class parametric cepstrum feature normalization front-end and (2) Monte Carlo robust noisy HMM estimation back-end are proposed in this study. The first approach models and normalizes the distribution of the speech features using mixtures of two Gaussian probability density functions and a mean subtraction, variance normalization and ARMA filter (MVA). The second one uses Monte Carlo simulation to generate artificial noisy speech features in order to build robust HMMs for various noisy environments. The experimental results on Aurora 2 clean training condition had shown that the two-class parametric cepstrum feature normalization front-end achieved 83.87%, and Monte Carlo robust noisy HMM estimation back-end achieved 89.61% digit recognition rates.

並列關鍵字

Environment Mismatch ； Feature Normalization ； Model Estimation

參考文獻

[30] 蔡尚年，以同步式分佈等化法為基礎的強健性語音特徵前端處理技術，碩士論文，國立台灣大學電信工程學研究所，台北，2004。

[1] T. Gulzow, T. Ludwig and U. Heute, “Spectral-substrction speech enhancement in multirate systems and without non-uniform and adaptive bandwidths,” Signal Processing, vol. 83, 2003, pp.1613-1631.

[2] D. Flogeras, R. Doraiswami and M. E. Kaye, “A real time spectral subtraction based speech enhancement scheme,” IEEE CCECE 2003, vol. 2, 2003, pp.1071-1074.

[3] P. Lockwood and J. Boudy, “Experiments with a nonlinear spectral subtractor (NSS), hidden Markov models and the projection, for robust speech recognition in cars,” Eurospeech, vol. 11, 1992, pp.215-228.

[4] J. Sohn, N. S. Kim and W. Sung, “A statistical model-based voice activity detection,” IEEE Signal Porcessing, vol. 6, no. 1, January 1999.

國際替代計量

分群式倒頻譜參數正規化與蒙地卡羅強健性語音模型估測

全文下載

主題瀏覽