  • 學位論文


Advanced Modulation Spectrum Compensation Techniques for Robust Speech Recognition

指導教授 : 洪志偉


自動語音辨識是一門很值得研究開發的課題。現今多數的語音辨識系統若應用於不受干擾的安靜環境,雖然能得到相當滿意的辨識效果,但若將其應用於實際的環境中,則會受到環境雜訊的影響,導致辨識效能明顯地下降,因此發展多年的環境強健性技術,即是針對此項缺點作改進。 在許多環境強健性技術中,有一類方法為對語音特徵作統計上的正規化,而在先前對語音特徵之調變頻譜之正規化的研究裡,若對分頻段的頻譜做正規化處理,相對於全頻帶正規化的方法有較好的效能,但由於不等切的切割方式,將調變頻譜中低頻部份分的比較細,導致低頻範圍的子頻段,會有頻譜點數不足的問題,影響到我們計算其頻譜特徵統計值的精確度,因此這些方法應有改進的空間;基於此觀察,本論文提出一系列重疊式分頻段調變頻譜統計正規化法,此類方法可以有效提升子頻段中用以計算統計值的頻譜點數,提升統計值的精確度,進而改善分頻段統計正規化法的效能。另外,我們也將有名的Teager運算子法運用於提出語音的頻譜特徵上,可以使所得特徵在環境強健性上的效能更為優越。 本論文採用國際通用的AURORA-2連續數字語料庫作一系列的語音辨識實驗。由實驗結果可明確驗證,我們提出的重疊式分頻段方法比起傳統非重疊式分頻段的方法更能有效地提升各種雜訊環境下的辨識精確率;此外,我們也將這些新方法結合傳統之時間序列域特徵正規化法與新提出的Teager頻譜能量運算子法,實驗結果皆顯示這樣的組合皆能比單一方法更有效地提升辨識率,足見它們有良好的加成性。


In this paper, we propose a novel scheme in performing feature statistics normalization techniques for robust speech recognition. In the proposed approach, the processed temporal domain feature sequence is first converted into the modulation spectral domain. The magnitude part of the modulation spectrum is decomposed into overlapped non-uniform sub-band segments, and then each sub-band segment is individually processed by the well-known normalization methods, like mean normalization (MN) and mean and variance normalization (MVN). Finally, we reconstruct the feature stream with all the modified sub-band magnitude spectral segments and the original phase spectrum using the inverse DFT. With this process, the components that correspond to more important modulation spectral bands in the feature sequence can be processed separately, and more spectral samples within each band give rise to more accurate statistic estimates due to overlapping the adjacent segments. For the Aurora-2 clean-condition training task, the new proposed overlapping sub-band spectral MN and MVN provide further error rate reductions over the conventional non-overlapping ones.


[1] 王小川, “語音訊號處理”," 全華科技圖書(NOLISP), 2004.
[2] S. Furui, Cepstral analysis technique for automatic speaker veri_cation," IEEE Trans. on Acoustics, Speech and Signal Processing, pp. 254-272, 1981.
[3] O. Viikki and K. Laurila, Cepstral domain segmental feature vector normalization for noise robust speech recognition," Speech Communication, Vol.25, pp.133-147,
[4] O. Viikki and K. Laurila, Noise robust HMM-based speech recognition using segmental cepstral feature vector normalization," SCA NATO Workshop on Robust Speech Recognition for Unknown Communication Channels, Pont-a-Mousson, France, pp. 107-110, 1997.
