  • 學位論文


Frequency-Optimization-Based Temporal Filters and Its Application to Speech Recognition.

指導教授 : 吳俊德


隨著科技時代的來臨,人們對於科技產品的需求逐漸提高,過去生活上有許多事物都必須依賴遙控器、鍵盤、滑鼠等等的輸入設備。 現今行動通訊、無線網路、智慧型手機等等的技術日益成熟,人們與機器的溝通,相信可以採取更人性化,更自然的設計。 目前自動語音辨識系統在日常生活中已經可見。例如語音衛星導航、電話號碼語音查詢系統、互動式電腦輔助教學系統等等。語音可以應用於許多領域,以及在不同的平台上。然而一套完美的語音辨識系統,必須相對要有一定精確度,才能在市場上被大眾所接受。現今所發展的的語音訊號處理及辨識相關的技術,用在實際的環境時,辨識的效能往往沒有那麼理想。 此論文分別是利用兩種的強健語音辨識系統,一是結合不同模型之優點對倒頻譜參數降維,二是藉由正規化語音特徵來降低雜訊造成的影響。第一部分,利用受限的主成分分析(Constrained Principal components analysis, C-PCA)的方法,以提升訓練模型來強健性語音資料的特徵參數。第二部分,使用倒頻譜平均與變異數正規劃法(Cepstral mean and variance normalization, CMVN)和倒頻譜增益正規化法(Cepstral gain normalization, CGN)為基礎,透過調變頻域上的統計特性求得最佳化的時間序列濾波器的系數。


With the advent of the technological age, people gradually increase the demand for technology products, in the past many things in life have to rely on the remote control, keyboard, mouse, input devices and so on. Recent mobile communication, wireless networks, smart phones, and so the technology has become more sophisticated, people and machines to communicate, I believe you can take a more humane, more natural design. Because language is the simplest of interpersonal communication and the most convenient and effective way, if the human speech to give instructions through to control the machine, the relative improve human quality of life better. Many scholars over the years through research, theory for the characteristics of speech, there is further knowledge and understanding to make human communication with machines is no longer a problem, so the speech recognition can be said that the best of the man-machine interface treatment. Current automatic speech recognition system has been seen in everyday life. Such as speech satellite navigation, telephone number inquiry system, interactive computer-assisted teaching system and so on. Speech recognition system can be applied in many fields, and in different platforms. However, a perfect speech recognition system, we must have a certain relative accuracy, in order to be in the market accepted by the public. The development of modern signal processing and speech recognition related technologies used in the actual environment, recognition performance is often not that ideal. This thesis is investigated in two ways to reach enhance the recognition rates, one is to combine different modes which have different advantages, and the other is cepstral statistics normalization techniques to reduce noise effect. The first part of thesis combines Constrained Principal Component Analysis(C-PCA) to train model and reach robust the speech feature. The second part of thesis uses Cepstral mean and variance normalization,(CMVN) and Cepstral gain normalization(CGN) effectively enhance the recognition performance under noisy environments to provide further improvement.


[1] V. Zue, “Speech in Oxygen”, Technical Report, Computer Science Lab, MIT, Cambridge, MA, USA, May, 2001
[2] O. Viikki and K. Laurila, “Noise robust HMM-based speech recognition using segmental cepstral feature vector normalization,” in Proc. ESCA NATO Workshop Robust Speech Recognition Unknown Communication Channels, Pont-a-Mousson, France, 1997, pp. 107–110.
[3] S. Yoshizawa, N. Hayasaka, W. Naoya, and Y. Miyanaga, “Cepstral gain normalization for noise robust speech recognition,” in Proc. ICASSP, 2004, pp. I-209–I-212.
[4] N. Kanedera, T. Aria, H. Hermansky and M. Pavel. ”On the Importance of Various Modulation Frequencies for Speech Recognition”. Proceedings of Eurospeech, 1997.
[5] F. Bernard and H. U. Reinhold. “A Comparative Study of Linear Feature Transformation techniques for Automatic Speech Recognition”. Proceedings of ICSLP 1996.


