語音特徵倒頻帶塑型與正規化於強健性語音辨識 之研究

The study of speech feature shaping and normalization in quefrency bands for noise-robust speech recognition

指導教授 : 洪志偉


本論文提出了一種語音辨識中強化特徵雜訊強健性的新技術,來改進雜訊環境 下語音辨識的效能。此技術名為加權式子頻帶階層統計圖等化法(weighted sub-band level histogram equalization, WS-HEQ),此方法主要參考了近年來所新提出的子頻帶階層統計圖等化法(sub-band level histogram equalization, S-HEQ),對其強健性效能與執行效率加以提升。在所新提出的WS-HEQ 法中,我們特別考慮了音框內倒頻譜特徵之高頻成分與低頻成分非等量重要的資訊,進而對高頻成分加以適度的抑制,配合統計圖等化法的處理,可使語音特徵所受的雜訊效應得到比S-HEQ 更明顯的降低。我們提出了四種WS-HEQ 的變型,它們不同之處在於使用HEQ 次數的多寡與濾波器的形式,而其中三種WS-HEQ 的運算複雜度明顯低於S-HEQ。在國際通用的語音資料庫Aurora-2 上,我們驗證了所提出之WS-HEQ 法能夠大幅提昇各種雜訊環境下語音辨識的精確度,同時,四種WS-HEQ 的變型其辨識率都明顯高於原始HEQ,且它們在大多數情形下也能比S-HEQ 得到更佳的辨識表現。


In this study, we develop a novel noise-robustness method, termed weighted sub-band level histogram equalization (WS-HEQ), to promote the speech recognition accuracy in a noise-corrupted environment. Based on the observation that the high-pass and low-pass portions of the intra-frame cepstral features possess unequal importance for speech recognition and different signal-to-noise ratios (SNRs), WS-HEQ intends to alleviate the high-pass portion in order to highlight the speech components and reduce the effect of noise. Furthermore, we provide four variants of WS-HEQ, which primarily refer to the structure of sub-band level histogram equalization (S-HEQ). In the experiments conducted on the Aurora-2 connected US digit database, we show that all the presented four variants of WS-HEQ give significant recognition improvements relative to the MFCC baseline in various noise-corrupted situations. WS-HEQ outperforms HEQ in recognition accuracy, and it behaves better than S-HEQ in most cases. Besides, WS-HEQ can be implemented more efficiently than S-HEQ since fewer HEQ processes are needed in WS-HEQ than S-HEQ.


