本研究以PDA為平台之語音控制系統,討論以類神經網路為主之向量量化過程,對語音系統辨識率的影響。使用的方法包括利用數位訊號處理技術擷取語音特徵參數,向量量化方法作前處理,以及隱藏式馬可夫模型為主的辨識及訓練演算法。 特徵擷取使用梅爾倒頻譜係數(MFCC,Mel-Frequency Cepstrum Coefficient)。向量量化採用改良k means的二分法,類神經網路的自我組織特徵映射網路(Self-Organizing Feature Map network),與頻率感應競爭式學習網路(Frequency-Sensitive Competitive Learning)三種方法,並對此三種做法逐一探討。在訓練階段,語音的特徵參數透過Baum-Welch演算法來訓練各個隱藏式馬可夫模型(Hidden Makov Model)內的參數。在辨識階段,使用維特比演算法(Viterbi algorithm)快速的求出機率的近似值,並透過Windows API程式介面,來執行辨識後的指令動作。其功能包括預約行程,以及連線上網…等,經由語音輸入指令,使得操作PDA更加便利。
This research is on the speech control system constructed on Personal Digital Assistant (PDA), and discuss the process of vector quantization how to affect the recognition rate under this system. Process methods include how to extract speech feature vector, preprocess of vector quantization, and hidden Markov model for training and recognition algorithm. The feature vector extraction use Mel Frequency Cepstrum Coefficient. Vector quantization use three methods, including binary splitting improved by k-mean clustering algorithm, neural network’s self organizing map and frequency sensitive competitive learning. And then discuss the three methods sequentially. During training stage, feature parameters in hidden Markov model are trained by Baum-Welch algorithm. During recognition stage, use Viterbi algorithm to find out the approximate value of probability quickly. And then via program interface of Windows Application Programming Interface to execute the instruction after recognition. The functions include making appointment and exploring Internet, etc. Via speech to input command makes more convenient to operate PDA.