透過您的圖書館登入
IP:18.117.156.113
  • 學位論文

隱藏式馬可夫模型之語音辨識在電視控制系統之應用

An application of Speech Recognition use Markov Hidden Model on Controlling TV System

指導教授 : 陳國在

摘要


縱古溯今,隨著電子設備及訊號處理技術快速發展,以及雲端概念之導入,促成語音辨識系統之實用性高度增加,致使其成為一項重要的科技,而語音控制係意謂著機器能夠理解人類語言的內容。在本論文中,語音辨識系統的法則係根據語音辨識中的隱藏式馬可夫模型演算法建構而成,而發展語音辨識架構,尚須利用基本的訊號前處理,以及梅爾倒頻譜係數抽取特徵向量,以提供隱藏式馬可夫模型進行建模。進而利用維特比演算法做文字的語音辨識。 論及語音前處理的目的,係將訊號處理到吾人較容易使用的形式,而梅爾倒頻譜係數則是一種取得語音特徵的方法。為取得良好的特徵,錄音檔的語音品質扮演著辨識時的重要關鍵。本論文描述一個經由語音控制的擬電視系統。 本論文根據隱藏式馬可夫模型之中文語音辨識,將其應用於控制系統,以控制電視的特定語音命令之實驗中。此實驗分成三種架構進行,第一種架構利用建立好的語音和原始語音資料庫做比對,辨識率可達97.4%;第二種架構為非訓練語音和語音資料庫做比對,辨識率可達92.4%;第三種架構為導入決策樹後以非實驗限定語句進行辨識及控制,觀察是否能達到語音所要求之行為,控制正確率可達96%。實驗結果顯示,選擇適當的命令文字及狀態數可得相當高的辨識率;反之,過多或過少的狀態數會降低其辨識率,過多的高斯混和模型造成計算量龐大且辨識率不再增加。最後,文中亦探討如何整合語音辨識技術與相關硬體控制技術,以及對電視控制之應用。

並列摘要


Regardless of past, present or future, speech recognition system always plays an important role in scientific technology. It is very important to couple fast development of the electronic equipment and signal processing technology with the cloud concepts. Accordingly, the speech recognition is much useful than before. As regards to speech control, it means that machines can understand human’s languages. In this study, the speech recognition system is based on HMM (Hidden Markov Models) algorithm. Therefore, it is of course that the basic signal processing and MFCC (Mel-scale Frequency Cepstral Coefficient) are used by the framework of development speech recognition. Moreover, some speech models are used by Viterbi algorithm to identify the human speeches. Specially, the speech signal involved must be processed to be more easily to use, and MFCC is used to extract the speech features. In order to achieve good features, the quality of the speech recording is the key point for recognition. This study is to describe an embedded system via speech control. In which, the Chinese speech recognition based on HMM is used in control systems and further the TV is controlled in a specific voice commands experiment. In the above experiment, it is divided into three architectures. Among them, the recognized speech is compared with the original one in the first structure, and consequently the recognition rate by 97.4% is obtained. In the second structure, non-learning speech is recognized to compare with the speech data base, in which the recognition rate by 92.4% is obtained. As regards to the third framework, specifically non-experimental speech is identified and controlled by importing the decision tree involved, so as to investigate whether the behavior to be required by the speech is achieved or not, in which the control accuracy rate by 96% is obtained. The experimental results show that an appropriate selection on status-state numbers can get high recognition rate. On the contrary, too much or too less status- states will lead the lower recognition rate. Too many Gaussian mixtures cause complicated computation and consequently to drive the recognition rate no longer to increase. Finally in the study, it is to explore how to integrate speech recognition technology built and its related hardware control technology that is applied in the TV control.

並列關鍵字

HMM Viterbi MFCC Speech control Speech recognition

參考文獻


2. Jieming, Z., et al. Developing a voice control system for ZigBee-based home automation networks. in Network Infrastructure and Digital Content, 2010 2nd IEEE International Conference on. 2010.
3. Philips, M.L. Voice control of remote stereoscopic systems. in Southeastcon '90. Proceedings., IEEE. 1990.
4. Rabiner, L.R., A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 1989. 77(2): p. 257-286.
6. Vidal, E., et al., On the use of a metric-space search algorithm (AESA) for fast DTW-based recognition of isolated words. Acoustics, Speech and Signal Processing, IEEE Transactions on, 1988. 36(5): p. 651-660.
7. Lichtenauer, J.F., E.A. Hendriks, and M.J. Reinders, Sign Language Recognition by Combining Statistical DTW and Independent Classification. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2008. 30(11): p. 2040-2046.

延伸閱讀