透過您的圖書館登入
IP:3.137.183.14
  • 學位論文

調變頻譜正規化法之強健性語音辨識

Modulation Spectrum Normalization for Robust Speech Recognition

指導教授 : 吳俊德
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


在人隨著科技時代的來臨,人們對於科技產品的需求逐漸提高,過去生活上有許多事物都必須依賴遙控器、鍵盤、滑鼠等等的輸入設備。 現今行動通訊、無線網路、智慧型手機等等的技術日益成熟,人們與機器的溝通,相信可以採取更人性化,更自然的設計。 語音可以應用於許多領域,以及運用在不同的平台上。然而一套完美的語音辨識系統,必須相對要有一定精確度,才能在市場上被大眾所接受。在實驗室環境中,我們可以得到良好的辨識率,但在實際生活中,因雜訊干擾的存在,使得語音辨識率不如預期的好。 本論文中,我們利用兩種的語音特徵強健技術,分別是倒頻譜平均與變異數正規化法(Cepstral mean and variance normalization, CMVN)與倒頻譜增益正規化法(Cepstral gain normalization, CGN),對語音特徵的調變頻譜作統計上的正規化,此類方法可以有效提升頻段中用以計算統計值的頻譜點數,提升統計值的精確度,進而改善分頻段統計正規化法的效能,可以使所得特徵在環境強健性上的效能更為優越。實驗結果也顯示透過調變頻域上的統計特性有較佳的辨識率。

並列摘要


In human civilization, people gradually increase the demand for technology products, in the past many things in life have to rely on the remote control, keyboard, mouse, input devices and so on. Recent mobile communication, wireless networks, smart phones, and so the technology has become more sophisticated, people and machines to communicate, I believe you can take a more humane, more natural design. In this thesis, we present two scheme to improve the noise robustness of features in speech recognition. Cepstral mean and variance normalization (CMVN) and cepstral gain normalization (CGN), the processed temporal domain feature sequence is first converted into the modulation spectral domain. The magnitude part of the modulation spectrum is decomposed into overlapped non-uniform sub-band segments, and then each sub-band segment is individually processed by the normalization methods. Recognition experiments implemented on database show that the two methods effectively improve the recognition range of noise environment, like CMVN and CGN, to achieve a more excellent recognition performance.

參考文獻


[1] R. K. Moore, “PRESENCE: A Human-Inspired Architecture for Speech-Based Human-Machine Interaction,” IEEE Transactions on Computers., vol. 56, no. 9, pp. 1176-1188, Sept. 2007.
[2] V. Mitra, H. Nam, C. Y. Espy-Wilson, E. Saltzman, and L. Goldstein, “Articulatory Information for Noise Robust Speech Recognition,” IEEE Trans. Audio, Speech, Language Process., vol. 19, no. 7, pp. 1913-1924, Sept. 2011.
[3] O. Watts, J. Yamagishi, S. King, and K. Berkling, “Synthesis of Child Speech With HMM Adaptation and Voice Conversion,” IEEE Trans. Audio, Speech, Language Process., vol. 18, no. 5, pp. 1005-1016, July 2010.
[4] B. K. W. Mak, Y. C. Tam, and P. Q. Lai, “Discriminative auditory-based features for
robust speech recognition,” IEEE Trans. Audio, Speech, Language Process., vol. 12, no. 1, pp. 27- 36, Jan. 2004.

延伸閱讀