  • 學位論文


The Study of Generalized Logarithmic Modulation Spectrum Normalization in Robust Speech Recognition

指導教授 : 吳幼麟
共同指導教授 : 洪志偉(Jeih-Weih Hung)


本論文提出一種新的語音特徵強健技術,藉由對語音特徵時間序列之調變頻譜進行正規化,改善雜訊環境下的語音辨識精確度。此方法將語音特徵時間序列之調變頻譜的強度成分(magnitude spectrum),利用廣義對數函數作轉換後,再進行平均值正規化法,最後再以廣義指數函數作反轉換,而得到更新的調變頻譜強度,我們將其命名為廣義對數調變頻譜平均值正規化法(generalized-logarithmic modulation spectrum mean normalization, GLMSMN)。 在資料庫的選擇上,我們採用國際通用AURORA 2連續數字語料庫,其中的語音訊號分別受到各種加成性雜訊與通道效應的影響。從實驗的結果證實,我們提出的GLMSMN操作在MVN特徵上時,和MFCC以及MVN比較在辨識精確度上有明顯的提升,效果與諸多著名的強健性技術(如統計圖正規化法、時序架構正規化法等)並駕齊驅甚至超越之,且其運算簡易,因此十分具有實用價值。


This thesis presents a novel use of the generalized logarithm operation (q-logarithm) in refining the modulation spectrum of speech features for noise-robust speech recognition. The resulting new method, termed generalized logarithmic modulation spectral mean normalization (GLMSMN), equalizes the average of the magnitude modulation spectrum in q-logarithmic domain for different utterances in order to alleviate the effect of noise. In the Aurora-2 connected-digit database and evaluation task, the presented GLMSMN operating on the MVN features reveals significant improvement in recognition accuracy in comparison with the MFCC baseline and MVN. Furthermore, we reveal that the presented GLMSMN outperforms the well-known techniques like histogram equalization (HEQ) and MVN plus ARMA filtering (MVA) in the Aurora-2 task. As a result, GLMSMN is quite effective in enhancing noise robustness of speech features.


[1] S. Furui, "Cepstral analysis technique for automatic speaker verification," IEEE Trans. on Acoustics, Speech and Signal Processing, pp. 254-272, 1981.
[2] O. Viikki and K. Laurila, "Cepstral domain segmental feature vector normalization for noise robust speech recognition," Speech Communication, vol.25, pp. 133-147, 1998.
[3] F. Hilger and H. Ney, "Quantile based histogram equalization for noise robust large vocabulary speech recognition," IEEE Trans. on Audio, Speech and Language Processing, pp. 845-854, 2006.
[4] S. Yoshizawa et al., "Cepstral gain normalization for noise robust speech recognition," ICASSP 2004.
[5] C-W. Hsu and L-S. Lee, "Higher order cepstral moment normalization (HOCMN) for robust speech recognition," ICASSP 2004.
