透過您的圖書館登入
IP:18.216.123.120
  • 學位論文

調變頻譜指數權重法於強健性語音辨識之研究

The study of modulation spectrum exponential weighting for robust speech recognition

指導教授 : 洪志偉

摘要


對自動語音辨識系統而言,雜訊的存在一直是影響其效能的主要因素之一,因此提升雜訊環境之強健技術一直是此領域研究的重要方向。在本論文中,我們嘗試將由訓練環境之乾淨語音的參考調變頻譜強度與測試環境之語音的調變頻譜強度作指數上的加權組合,藉此得到更具強健性的語音特徵時間序列。藉由在不同的訊雜比環境下的分析,上述二者權重的調整分配能夠讓語音特徵在不同干擾程度的環境下有更好的強健性。上述的新方法當其運用於國際通用的AURORA-2語音資料庫的實驗時,我們發現藉由所提之彈性加權的方式相較於原始固定式的加權可得到更好的辨識率。因此,當此方法運用於實際語音辨識系統時,我們可藉由當下環境的雜訊干擾程度(訊雜比),使用適當的權重設定來更新語音特徵之調變頻譜強度、進而達到更佳的語音辨識效能。 此外,我們為上述的調變頻譜強度指數加權法提出了時間序列濾波器實現的機制,藉由將參考調變頻譜強度轉化為時間序列濾波器的脈衝響應,我們可以將原始特徵序列通過此時間序列濾波器,得到等同於調變頻譜指數權重法的效能。此一機制可使調變頻譜指數權重法進一步具備少量時間延遲的優點、達到近似線上執行的方式。

並列摘要


In this thesis, we present a series of novel algorithms to improve the noise robustness of features in speech recognition. In the algorithm termed modulation spectrum exponential weighting (MSEW), the magnitude spectra of feature streams are updated by combination of a reference magnitude spectrum and the original magnitude spectrum with varying exponential weights according the signal-to-noise ratio (SNR) of the operating envirionmnt. Speciifically, we present three modes of MSEW, which can viewed as a generalization of the algorithms, modulation spectrum replacement/filtering (MSR/MSF) In experiments conducted on the AURORA-2 noisy digit database, the presented MSEW algorithms can achieve better recognition accuracy rates relative to the original MSR and MSF. Furthermore, we propose to implement MSEW in the form of temporal filtering process. By designing the temporal filter associated with the reference magnitude spectrum used in MSEW, the corresponding temporal filtering operation significantly reduces the time delay in the original MSEW without the cost of performance degradation. That is, MSEW can be implemented in a nearly real-time manner with higher efficiency.

參考文獻


參考文獻
[1]S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction”, IEEE Trans. on Acoustics, Speech and Signal Processing, vol.27, pp. 113-120, Apr. 1979.
[2]C. Plapous, C. Marro and P. Scalart, “Improved signal-to-noise ratio estimation for speech enhancement”, IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 14, pp. 2098-2108, Nov. 2006.
[3]T. H. Hwang, “Energy contour extraction for in-car speech recognition”, 2003 European Conference on Speech Communication and Technology (Interspeech 2003—Eurospeech).
[4]S. Furui, “Cepstral analysis technique for automatic speaker verification”, IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 29, pp. 254-272, Apr. 1981.

延伸閱讀