透過您的圖書館登入
IP:3.138.200.66
  • 學位論文

數個基於離散小波轉換之新語音強健技術

Several New DWT-based Methods for Noise-Robust Speech Recognition

指導教授 : 洪志偉

摘要


在語音處理的諸多應用系統上,小波轉換已成為常用的分析或應用技術。由於小波具有較佳的時域-頻域分析與多重解析的特性,使得小波理論對於非穩態訊號的處理上,可以獲得較佳的效果。語音訊號屬於非穩態訊號,因此,本論文的研究方向在於小波轉換應用於語音訊號的處理上。在本篇論文裡,提出了三種新的處理技術,其一為語音特徵擷取的新方式,而另外兩種屬於強健性技術的範疇。此論文基本上可視為三個部分的組成,包含了小波理論的簡介、新演算法的介紹及實驗結果與討論。 第一部分將對本篇論文做一個介紹,及簡述小波轉換在語音領域上的應用,最後簡介小波轉換的理論。我們將說明小波轉換中的尺度方程式與多解析的概念,與如何在訊號上應用小波轉換。 第二部分為本論文的核心概念。這一部分將介紹傳統的語音特徵擷取方法,如梅爾倒頻譜特徵參數求取,並使用小波擷取出的語音特微與傳統的語音特徵做結合,而得到新的語音特徵,此新的語音特徵將包含兩種語音特徵的優點,進而提升辨識的精確度。另外,我們提出新的特徵強健性方法,包括分頻帶功率正規化法與低頻濾波兼零點內插法,它們將有效降低雜訊干擾對語音特徵的干擾。 第三部分包含了實驗環境的設定、實驗結果、結論與未來的工作。所有的實驗與分析將在此部分做介紹。根據實驗的結果,新的特徵相較於傳統的語音特徵上,有較佳的表現;新提出的強健性演算法在雜訊環境下的語音辨識亦有顯著的改進。這些實驗的結果都顯示新提出的演算法能有效地提升語音辨識的效能。

並列摘要


The wavelet transform has been one of the most useful analysis tools in speech signal processing. Due to the better time-frequency localization and the multi-resolution characteristics, the wavelet is more suitable for analyzing non-stationary signals. In this thesis, we use the wavelet transform to develop several new algorithms in processing speech signals mainly because speech signals are non-stationary. The new algorithms include one feature extraction scheme and two noise-robustness techniques. In the first part of this thesis, we give an outline for this thesis and a brief introduction of the wavelet transform. The concept of scale functions, multi-resolution property of the wavelet function and the way to implement the wavelet transform to the signal are explained. The second part is the main body of this thesis. Here we introduce how to create the well-known mel-frequency cepstral coefficients (MFCC). Then we propose to construct wavelet-filter cepstral ceofficients (WFCC) as the new speech feature representation. We find a proper combination of WFCC and MFCC outperforms WFCC and MFCC alone by increasing the recognition accuracy. Finally, we present two new noise-robustness algorithms based on the discrete wavelet transform, which can enhance the speech features against noise interference. The third part includes the experiment environment setting, the experiment results for the new proposed methods, a conclusion and the future work. According to the recognition results, we show the new proposed features provide better recognition performance than the conventional MFCC, and the new noise-robustness algorithms give significant recognition accuracy improvement in various noise-corrupted situations. These results indicate that all the proposed methods work well and help enhance the speech recognition system.

參考文獻


[1] X. Huang, A. Acero and H. W. Hon, "Spoken language processing: A guide to theory, algorithm, and system development," Prentice Hall PTR, 2001.
[2] R. Modic, B. Lindberg and B. Petek, "Comparative wavelet and MFCC speech recognition experiments on the slovenian and English speechDat2," ISCA Non-Linear Speech Processing (NOLISP), vol.16, 2003.
[3] M. Vetterli and J. Kova cevi c, "Wavelets and subband coding," Prentice-Hall PTR, 1995.
[4] O. Farooq and S. Datta, "Mel lter-like admissible wavelet packet structure for speech recognition," IEEE Signal Processing Letters, vol. 8, no. 7, pp. 196-198, 2001.
[5] H. Fan and J. Hung, "Sub-band feature statistics normaliztion techniques based on discrete wavelet transform for robust speech recognition," IEEE Signal Processing Letters, vol. 16, no. 9, pp. 806-809, 2009.

延伸閱讀