最小變異量調變頻譜濾波器於強健性語音辨識之研究

本論文所探討的是語音特徵強健性技術，來改善雜訊環境下語音辨識的效能。我們利用原始最小變異數調變濾波器法設計的目標函數『環境失真』，應用至求取濾波器之最佳頻率響應上，進而發展出兩種特徵時間序列濾波器求取演算法，分別為基於最小變異數準則之最小平方頻譜擬合法及基於最小變異數準則之強度頻譜內插法。在這兩種方法中，利用我們所求得的濾波器之最佳頻率響應取代原始最小平方頻譜擬合法與強度頻譜內插法中所使用的濾波器，來得到原始最小平方頻譜擬合法與強度頻譜內插法中欲逼近的目標功率頻譜密度，藉此提升這兩種調變頻譜正規化的效能，進而改善受到雜訊影響之語音辨識精確度。在實驗中，我們所採用的資料庫為國際通用AURORA-2連續數字語料庫，其中的語音訊號分別受到各種加成性雜訊與通道效應的影響。經由實驗結果證實，我們提出的新方法:基於最小變異數準則為基礎之兩種調變頻譜正規化法相較於原始的兩種調變頻譜正規化法，所得到的語音特徵有助於提升語音的強健性。

關鍵字

語音辨識；調變頻譜；強健性語音特徵參數

並列摘要

The modulation spectra of speech features are often distorted due to environmental interferences. In order to reduce the distortion, in this paper we apply the minimum variance(MV) criterion to obtain the optimal frequency response of the temporal filter, and then two approaches, least-squares spectral fitting (LSSF) and magnitude spectrum interpolation (MSI) are used to obtain the filtered feature sequence. Accordingly, two new temporal processing approaches are proposed, which are named MV-LSSF and MV-MSI, respectively. In the Aurora-2 clean-condition training task, we show that the new MV-LSSF and MV-MSI give more than 50% relative error rate reduction over the baseline, and provide relative error rate reductions of 8.18% and 2.73% over the conventional LSSF and MSI, respectively. These results reveal that the proposed methods significantly enhance the robustness of speech features in noise-corrupted environments. Moreover, we show that these new methods can be integrated with the conventional temporal domain techniques to achieve even better recognition accuracy.

並列關鍵字

speech recognition ； minimum variance ； modulation spectra ； robust speech features

參考文獻

[1] http://tw.news.yahoo.com/article/url/d/a/091103/19/1u7le.html

Google Scholar

[2]王小川, ``語音訊號處理," 全華科技圖書, 2004.

Google Scholar

[3]Y. Gong, ``Speech recognition in noisy environments：a survey,"Speech Communication, Vol. 16, pp.261-291, 1995.

Google Scholar

[4]M. J.Gales, ``Model-based techniques for noise robust speech recognition," Ph.D. thesis, University of Cambridge, United Kingdom, Sep. 1995.

Google Scholar

[5] S. B. Davis and P. Mermelstein, ``Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences," IEEE Trans. on Acoustics, Speech and Signal Processing, pp.357-366, 1980.

Google Scholar

國際替代計量

最小變異量調變頻譜濾波器於強健性語音辨識之研究

主題瀏覽