透過您的圖書館登入
IP:18.221.146.223
  • 學位論文

各域之語音特徵分頻帶處理於強健性語音辨識之研究

Sub-band Processing in Various Domains of Speech Features for Robust Speech Recognition

指導教授 : 洪志偉

摘要


自動語音辨識系統容易受到雜訊干擾的影響,導致辨識精確率下降。為了有效改善此情形,一系列語音特徵強健化技術陸續被提出,藉以提升雜訊環境下的辨識效能。 在本論文中,主要探討不同頻率範圍對於語音辨識的重要性,改良傳統全頻帶處理方式,進而發展出各域之分頻式處理的強健性方法,分述如下: ? 時間序列域:小波消噪法(Wavelet-denoising)與基於小波轉換技術之分頻統計正規化法(Sub-band feature statistics normalization) ? 調變頻譜域:分頻帶調變頻譜正規化法(Sub-band modulation spectrum normalization)與調變頻譜冪次展開法(Modulation spectrum power-law expansion) ? 空間域:加權式子頻帶統計圖正規化(Weighted sub-band histogram equalization) 我們選用Aurora-2 連續數字與Aurora-4大字彙語料庫來評量新方法辨識效能,從實驗結果顯示,上述分頻帶處理之新方法都可有效提高在雜訊干擾因素下的辨識精確率,且優於傳統全頻帶處理方式,顯示提出的新語音特徵參數具備更佳的雜訊強健性。

並列摘要


The environmental mismatch caused by additive noise and/or channel distortion often dramatically degrades the performance of an automatic speech recognition system (ASR). In order to reduce this mismatch, a plenty of robustness techniques have been developed. This dissertation proposes several novel methods via using sub-band process in different domains of speech features to improve noise robustness for speech recognition. Briefly speaking, in this dissertation we investigate the noise effect in three domains of speech features and then develop the respective counter measures. Firstly, we present the methods of wavelet threshold de-noising and sub-band feature statistics normalization that are applied in temporal domain. Second, two modulation-domain algorithms, sub-band modulation spectrum normalization and modulation spectrum power-law expansion, are developed and evaluated. Finally, we provide a novel scheme that processes high- and low-pass portions of the spatial-domain features, and this scheme is called weighted sub-band histogram equalization. The presented novel methods are examined in two databases, Aurora-2 and Aurora-4. The corresponding experiment results show these sub-band methods behave better than the respective full-band methods in most cases, and they benefit the speech recognition process significantly by improving the recognition accuracy under a wide range of noise environments.

參考文獻


References
[1] L. R. Rabiner and R. W. Schafer, “Theory and applications of digital speech processing,” 1st edition, Prentice Hall, 2010.
[2] The teaching materials of “Spoken Language Processing,” from Prof. Berlin Chen, http://berlin.csie.ntnu.edu.tw/.
[3] B. S. Atal, “The history of linear prediction,” IEEE Signal Processing Magazine, 23(2), pp. 154-161, 2006.
[4] S. B. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuous spoken sentences,” IEEE Transaction on Acoustics, Speech and Signal Processing, 28(4), pp. 357-366, 1989.

延伸閱讀