透過您的圖書館登入
IP:3.145.206.169
  • 學位論文

分頻式調變頻譜分解於強健性語音辨識之研究

The study of sub-band modulation spectrum factorization in robust speech recognition

指導教授 : 洪志偉

摘要


在本篇論文中,我們使用了非負矩陣分解(nonnegative matrix decomposition, NMF)技術來強化語音特徵調變頻譜、藉此提升自動語音辨識系統之雜訊強健性,其中,NMF法為語音之調變頻譜的強度求取一組基底向量,而我們藉由此組基底向量來擷取語音中重要的辨識成分,跟以往基於NMF之強健技術不同之處在於兩點:其一,我們利用了正交投影(orthogonal projection)的方式取代原先的迭代方式,使運算速度大幅增加,,其二,我們採取分頻帶分解的方式取代原先全頻帶分解,藉此減少計算量。在Aurora-2之連續數字資料庫之辨識實驗顯示,上述的新方法相對於基礎實驗而言,能有效提升雜訊環境下語音辨識的精確度 ,可提供高達58%的相對錯誤改善率,而跟原NMF法相較,新方法運算複雜度明顯降低,而能維持原辨識精確度、部分甚至有提升的效果。

並列摘要


In this paper, we propose to enhance the modulation spectrum of speech features in noise robustness via the technique of non-negative matrix factorization (NMF). With NMF, a set of non-negative basis spectra vectors is derived from the clean speech to represent the important components for speech recognition. However, compared with the original NMF-based scheme that employs iterative search to update the full-band modulation spec-tra, we propose to apply the orthogonal projection to update the low sub-band modulation spectra. In contrast to the original scheme, the presented new process significantly reduces the computation complexity without the cost of degraded recognition performance. In the experiments conducted on the Aurora-2 database, we show that the presented new NMF-based approach can provide an average error reduction rate of over 65% relative as compared with the baseline MFCC system.

參考文獻


[1] 王小川, “語音訊號處理,” 全華科技圖書, 2004.
[2] S. Furui, “Cepstral analysis technique for automatic speaker verification,” IEEE Trans. On Acoustics, Speech and Signal Processing, pp.254-272, 1981.
[3] S. Tiberewala and H. Hermansky, “Multiband and adaptation approaches to robust speech recognition,” 1997 European Conference on Speech Communication and Technology (Eurospeech 1997).
[4] S. Yoshizawa, N. Hayasaka, N. Wada and Y. Miyanaga, “Cepstral gain normalization for noise robust speech recognition,” 2004 International Conference on Acoustics, Speech and Signal Processing (ICASSP 2004), pp.1021-1024, 2004.
[5] H. Hermansky and N. Morgan, “RASTA Processing of speech features,” IEEE Trans-actions on Industrial Electronics, IEEE Trans. On Speech and Audio Processing, 1994.

延伸閱讀