透過您的圖書館登入
IP:3.144.93.73
  • 學位論文

加成性雜訊環境下倒頻譜統計正規化法於強健性語音辨識之研究

Study of Cepstral Statistics Normalization Techniques for Robust Speech Recognition in Additive Noise Environments

指導教授 : 洪志偉

摘要


一套自動語音辨識系統,在雜訊環境下其辨識效果通常會受到明顯影響,該如何有效地克服這樣的問題,一直以來都是此領域研究的重點,本論文即是針對此問題加以研究,而提出幾種改進技術。在過去的研究中,有一系列的改進技術,是藉由正規化語音特徵的統計特性來降低雜訊的影響,例如:倒頻譜平均消去法、倒頻譜平均值與變異數正規化法與統計圖等化法等,這些方法被證明皆有明顯的效能,可以有效提升語音特徵在雜訊環境下的強健性。本論文即是以這三種倒頻譜特徵參數正規化技術為背景,發展一系列改進之強健性方法。 前面所提到的三種特徵參數正規化技術中所須用到的特徵統計值,通常是由整段的語句或片段的語句所包含的特徵求得,而在過去本實驗室的研究中,曾運用以碼簿(codebook)為基礎的方式來求取這些統計值,發現相對於之前的作法能有明顯進步。在本論文第一部分,我們提出一改良式的碼簿建構程序,其中使用語音偵測(voice activity detection, VAD) 技術來分隔訊號中的語音成分與非語音成分,然後利用語音部分的特徵來建構碼簿,同時對所建立之碼簿中的每個碼字(codeword)賦予權重(weight),此程序所建構的碼簿,經實驗證實,可以提升原始碼簿式(codebook-based)特徵參數正規化法的效能。而在第二部份,我們則是整合上述之碼簿式(codebook-based)與整段式(utterance-based)兩類方法所得到之特徵統計資訊,發展出所謂的組合式(associative)特徵參數正規化法。此類組合式的新方法相較於整段式與碼簿式的方法,能得到更好的效果,更有效地提升加成性雜訊環境下語音的辨識精確度。

並列摘要


The noise robustness property for an automatic speech recognition system is one of the most important factors to determine its recognition accuracy under a noise-corrupted environment. Among the various approaches, normalizing the statistical quantities of speech features is a very promising direction to create more noise-robust features. The related feature normalization approaches include cepsral mean subtraction (CMS), cepstral mean and variance normalization (CMVN), histogram equalization (HEQ), etc. In addition, the statistical quantities used in these techniques can be obtained in an utterance-wise manner or a codebook-wise manner. It has been shown that in most cases, the latter behaves better than the former. In this thesis, we mainly focus on two issues. First, we develop a new procedure for developing the pseudo-stereo codebook, which is used in the codebook-based feature normalization approaches. The resulting new codebook is shown to provide a better estimate for the features statistics in order to enhance the performance of the codebook-based approaches. Second, we propose a series of new feature normalization approaches, including associative CMS (A-CMS), associative CMVN (A-CMVN) and associative HEQ (A-HEQ). In these approaches, two sources of statistic information for the features, the one from the utterance and the other from the codebook, are properly integrated. Experimental results show that these new feature normalization approaches perform significantly better than the conventional utterance-based and codebook-based ones. As the result, the proposed methods in this thesis effectively improve the noise robustness of speech features.

參考文獻


[1] 王小川, "語音訊號處理", 全華科技圖書, 2004
[2] Y. Gong, "Speech Recognition in Noisy Environments:A Survey", Speech Communication 16, 1995
[3] M. J. F Gales, "Model-based Technique for Noise Robust Speech Recognition", University of Cambridge.
[4] S. F. Boll, "Suppression of Acoustic Noise in Speech Using Spectral Subtraction", IEEE Trans. on Acoustics, Speech and Signal Processing, Vol. 27, NO. 2, pp.113-120, 1979
[5] P. Lockwood and J. Boudy, "Experiments with a Nonlinear Spectral Subtraction (NSS), Hidden Markov Models and the Projection, for Robust Speech Recognition in Cars", 1991 European Conference on Speech Communication and Technology (Interspeech 1991—Eurospeech).

延伸閱讀