調變頻譜正規化法使用於強健語音辨識之研究

自動語音辨識，在實際系統應用中，語音信號經常受到環境雜訊的影響而降低其辨識率。為了提升系統的效能，許多研究語音辨識的學者歷年來不斷地研究語音的強健技術，期望能達到語音辨識系統的最佳化表現。在本論文中，我們主要是是啟發於時間序列結構正規化法的觀念，進而探討並發展出更精確有效的調變頻譜正規化技術。我們提出了三種新方法，包含了等連波時間序列濾波器法、最小平方頻譜擬合法與強度頻譜內插法。這些方法將語音特徵時間序列的功率頻譜密度正規化至一參考的功率頻譜密度，而得得到新的語音特徵參數，藉此降低雜訊對語音之影響，進而提升雜訊環境下的語音辨識精確度。在實驗資料庫的選擇上，我們採用AURORA 2連續數字語料庫，其中的語音訊號分別受到八種加成性雜訊與兩種通道效應的影響。實驗的結果證實，我們提出的新方法在各種雜訊環境下皆可有效地將語音特徵參數強健化，進而大幅改進語音辨識率。此外，我們也將這些新方法結合其他特徵強健化的技術，發現這樣的結合能帶來更顯著之辨識率的提升。

關鍵字

語音辨識；調變頻譜正規化；強健性語音特徵參數

並列摘要

The performance of an automatic speech recognition system is often degraded due to the embedded noise in the processed speech signal. A variety of techniques have been proposed to deal with this problem, and one category of these techniques aims to normalize the temporal statistics of the speech features, which is the main direction of our proposed new approaches here. In this thesis, we propose a series of noise robustness approaches, all of which attempt to normalize the modulation spectrum of speech features. They include equi-ripple temporal filtering (ERTF), least-squares spectrum fitting (LSSF) and magnitude spectrum interpolation (MSI). With these approaches, the mismatch between the modulation spectra for clean and noise-corrupted speech features is reduced, and thus the resulting new features are expected to be more noise-robust. Recognition experiments implemented on Aurora-2 digit database show that the three new approaches effectively improve the recognition accuracy under a wide range of noise-corrupted environment. Moreover, it is also shown that they can be successfully combined with some other noise robustness approaches, like CMVN and MVA, to achieve a more excellent recognition performance.

並列關鍵字

speech recognition ； modulation spectrum ； robust speech features

參考文獻

[1] 王小川, "語音訊號處理", 全華科技圖書, 2004

Google Scholar

[2] C. J. Leggetter，P.C. Woodland, "Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models", Computer Speech and Language,1995

Google Scholar

[3] Y. Gong, "Speech Recognition in Noisy Environments：A Survey", Speech Communication 16, 1995

Google Scholar

[4] M. J. F. Gales, "Model-based Technique for Noise Robust Speech Recognition" , University of Cambridge.

Google Scholar

[5] S. F. Boll, "Suppression of Acoustic Noise in Speech Using Spectral Subtraction" , IEEE Trans. on Acoustics, Speech and Signal Processing, Vol.27, NO. 2, pp.113-120, 1979

Google Scholar

國際替代計量

調變頻譜正規化法使用於強健語音辨識之研究

全文下載

主題瀏覽