透過您的圖書館登入
IP:18.118.193.232
  • 學位論文

應用雜訊強健性之特徵於低音框率聲學模型之研究

A study of applying noise-robust features in reduced frame-rate acoustic models for speech recognition

指導教授 : 洪志偉

摘要


隨著科技發展,自動語音辨識的技術也逐漸成熟,但當自動語音辨識系統實際應用於現實環境時,往往受到許多雜訊的干擾,而造成辨識率大幅的下降。因此,在本篇論文中,我們提出將各種強健性技術所強化後的語音特徵,使用於低音框率之隱藏式馬可夫語音辨識模型,藉此降低雜訊對於語音辨識效能的干擾、同時兼顧在現今雲端語音辨識架構中傳輸的效能。在強健化技術上,我們分別使用了倒頻譜之平均值消去法、平均值與變異數正規化法與統計圖等化法等,藉此建立較不受雜訊干擾之語音特徵,藉著用訓練語料庫之語音特徵完整音框率之建立隱藏式馬可夫語音模型,進而將這些語音模型之狀態轉移參數加以調適,使原始完整音框率模型轉變成為能夠有效地辨識低音框率的語音特徵。 由實驗結果顯示,上述所提出的新方法,可以使傳輸語音至雲端伺服器之語音模型辨識時,傳輸效率高達2-4倍,然因模型調適之演算法,低至1/4傳輸量之語音特徵,與原始傳輸量的語音特徵,辨識率不相上下,而配合語音特徵強健技術,更可以有效降低雜訊干擾,使整體辨識效率與效能兼顧、達到滿意的辨識精確度。

並列摘要


Speech recognition in mobile devices has been increasingly popular in our life, while it has to deal with the requirements of high recognition accuracy and low transmission load. One of the most challenging tasks for improving the recognition accuracy for real-world applications is to alleviate the noise effect, and one prominent way to reducing the transmission load is to make the speech features as compact as possible. In this study, we evaluate and explore the effectiveness of integrating the noise-robust speech feature representation with the reduced frame-rate acoustic model architecture. The used noise-robustness algorithms for improving features include cepstral mean subtraction (CMS), ceptral mean and variance normalization (MVN), histogram equalization (HEQ), cepstral gain normalization (CGN), MVN plus auto-regressive moving average filtering (MVA) and modulation spectrum power-law expansion (MSPLE). On the other hand, the adapted hidden Markov model (HMM) structure for reduced frame-rate (RFR) speech features, developed by Professor Lee-min Lee, is exploited in our evaluation task. The experiments conducted on the Aurora-2 digit database shows that: in the clean noise-free situation, the adapted HMM with the RFR features can provide comparable recognition accuracy relative to the non-adapted HMM with full frame-rate (FFR) features, while in the noisy situations, the noise-robustness algorithms work well in the RFR HMM scenarios and are capable of improving the recognition performance even when the RFR down-sampling ratio is as low as 1/4.

參考文獻


[1] P. Lockwood and J. Boudy, ”Experiments with a Nonlinear Spectral Subtractor (NSS), Hidden Markov Models and Projection, for Robust Speech Recognition in Cars”, Speech Communication, Vol. 11, pp. 215-228, 1992
[2] C. Plapous, C. Marro and P. Scalart, “Improved signal-to-noise ratio estimation for speech enhancement,” IEEE Transactions on Acoustics, Speech and Signal Processing, 14(6), pp. 2098-2108, 2006.
[3] D. Gelbart and N. Morgan, “Evaluating long-term spectral subtraction for reverberant ASR,” in Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 103-106, 2001.
[4] S. Furui, “Cepstral analysis technique for automatic speaker verification,” IEEE Transactions on Acoustics, Speech and Signal Processing, 29(2), pp. 254-272, 1981.
[5] S. Tiberewala and H. Hermansky, “Multiband and adaptation approaches to robust speech recognition,” in Proceedings of European Conference on Speech Communication and Technology, 25(1-3), pp. 2619-2622, 1997.

延伸閱讀