在分散式語音辨識系統架構中,通常會遇到環境不匹配的問題,要解決此問題勢必得提升系統之強健性。為了增加語音辨識系統之強健性,我們將於系統前端(Front-end)提出分群式倒頻譜參數正規化法,使用兩個高斯混合模型分別對語音與非語音作調適;而在後端(Back-end)則提出結合蒙地卡羅強健性語音模型估測與MVA的方法,在乾淨語音訓練模型中加入人造(Artificial)雜訊,期望能夠藉此方法在不需錄製含雜訊語音語料庫的前提下提升整個語音辨識系統之辨識率。 我們使用Aurora 2語料庫來驗證上述我們所提出的方法:在前端處理部分,將分佈等化法中加入ARMA低通濾波器後可將原先分佈等化法之辨識率從80.78%提升至83.87%,而使用分群式倒頻譜參數正規化法則可進一步提升至84.05%;而在後端訓練模型處理部分,若結合蒙地卡羅強健性語音模型估測法與MVA,則可得到相當不錯的平均辨識率,為89.61%。
Environment mismatch is the major source of performance degradation in distribution speech recognition. To compensate the environment mismatch problems, two approaches including (1) two-class parametric cepstrum feature normalization front-end and (2) Monte Carlo robust noisy HMM estimation back-end are proposed in this study. The first approach models and normalizes the distribution of the speech features using mixtures of two Gaussian probability density functions and a mean subtraction, variance normalization and ARMA filter (MVA). The second one uses Monte Carlo simulation to generate artificial noisy speech features in order to build robust HMMs for various noisy environments. The experimental results on Aurora 2 clean training condition had shown that the two-class parametric cepstrum feature normalization front-end achieved 83.87%, and Monte Carlo robust noisy HMM estimation back-end achieved 89.61% digit recognition rates.