透過您的圖書館登入
IP:18.118.121.55
  • 學位論文

強健性語音辨識中基於小波轉換之分頻統計補償技術的研究

The Study of Sub-band Feature Statistics Compensation Techniques Based on a Discrete Wavelet Transform for Robust Speech Recognition

指導教授 : 洪志偉

摘要


本論文主要是發展語音特徵強健化技術,來改進雜訊環境下語音辨識的效能。我們改良原始全頻帶序列特徵統計補償技術,使用離散小波轉換來對語音特徵時間序列進行分頻帶的處理,進而發展出兩種分頻統計補償法,分別為分頻式平均與變異數正規化法與分頻式統計圖等化法。在這兩種新方法中,經由離散小波轉換所得之分頻帶的序列,在低頻帶部分其頻寬較小,而高頻帶部分則頻寬較大,如此便可對語音辨識較重要的低頻成分作較精細的處理,最後將所更新之分頻帶序列經過反離散小波轉換得到新的特徵時間序列。透過此程序,可分別處理對於特徵序列中不同重要性的調變頻譜成份,藉此提昇雜訊影響情況下語音辨識的精確度。 在資料庫的選擇上,我們採用國際通用AURORA 2連續數字語料庫,其中的語音訊號分別受到各種加成性雜訊與通道效應的影響。從實驗的結果證實,我們提出的新方法在各種雜訊環境下都優於傳統全頻帶式之方法,顯示所得到之特徵有益於提升語音的強健性。

並列摘要


The environmental mismatch caused by additive noise and/or channel distortion often degrades the performance of a speech recognition system seriously. Various robustness techniques have been proposed to reduce this mismatch, and one category of them aims to normalize the statistics of speech features in both training and testing conditions. In general, these statistics normalization methods deal with the speech feature sequences in a full-band manner, which somewhat ignores the fact that different modulation frequency components have unequal importance for speech recognition. With the above observations, in this paper we propose that the speech feature streams be processed in a sub-band manner. The processed temporal-domain feature sequence is first decomposed into non-uniform sub-bands using discrete wavelet transform (DWT), and then each sub-band stream is individually processed by the well-known normalization methods, like mean and variance normalization (MVN) and histogram equalization (HEQ). Finally, we reconstruct the feature stream with all the modified sub-band streams using inverse DWT. With this process, the components that correspond to more important modulation spectral bands in the feature sequence can be processed separately. For the Aurora-2 clean-condition training task, the new proposed sub-band MVN and HEQ provide relative error rate reductions of 20.32% and 16.39% over the conventional MVN and HEQ, respectively. These results reveal that the proposed methods significantly enhance the robustness of speech features in noise-corrupted environments.

參考文獻


[1] D.L. Donoho, " De-noising by soft-thresholding ", IEEE Trans. on Information Theory, vol. 41, Issue:3, pp. 613-627, May 1995.
[2] S. F. Boll, "Suppression of acoustic noise in speech using spectral subtraction", IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 27, No. 2, pp. 113-120, 1979
[3] L.R. Rabiner and M.R Sambur, " An algorithm for determining the endpoints of isolated utterances ", The Bell System Technical Journal, Vol.54, No.2, pp.297, February 1975.
[4] H.H. Lee and C.K. Un, " A study of on characteristics of conversational speech", IEEE Trans. on Communications, vol. COM-34, no.6, pp.630, June 1986.
[5] B.-F. Wu, K.-C. Wang, " Noise spectrum estimation with entropy-based VAD in non-stationary environments ", IEICE Trans. on Fundamentals of Electronics, Communications and Computer Sciences, vol. E89-A, Issue:2, February 2006.

延伸閱讀