本篇論文在建立一套能辨識台灣鐵路局站名之不特定語者語音辨識系統,此系統在Pentium PC、Windows 98作業系統下,利用Microsoft Visual C++ 6.0與Intel Recognition Primitives Library 等工具開發。 在聲學模型訓練過程,系統利用能量凹處(energy dip)、越零率(zero crossing rate)、自相關係數(autocorrelation function)做語音切割後,使用梅爾倒頻譜係數(MFCC, mel scale filter cepstral coefficient)求取特徵參數,經過Binary splitting尋找向量量化碼本,並以離散型隱藏式馬可夫模型(DHMM, Discrete Hidden Markov Models)建立語音模型後,再使用波氏演算法(BaumWelch Algorithm)做調適。在辨認方面,則是採用Kohonon Network求取碼字序列(codeword sequence),以光束搜尋法(Beam Search)取代維特比演算法(Viterbi algorithm)來計算最佳辨認的機率。 在不特定語者的實驗中,辨識正確率可到達85.75%,證明是一套可行的研究方法。
In this thesis, a speaker-independent Mandarin spoken word recog-nition system for Taiwan railway station is implemented. The system is built with components that include a Pentium PC, Microsoft Windows 98 operation system, Microsoft Visual C++ 6.0 and Intel Recognition Primi-tives Library. During the acoustic-model training stage, we employ the energy dip, zero crossing rate, and autocorrelation function to segment speech sounds. And use the MFCC (mel scale filter cepstral coefficient) to evaluate fea-ture parameters. Through the process of Binary splitting the vector quan-tization codebooks are found, the DHMM (Discrete Hidden Markov Models) is used to establish all acoustic-models, and the BaumWelch al-gorithm is chosen to adapt the optimal solution. On the recognition part, the Kohonon Network is used to calculate codeword sequence. The Beam search is used to replacement of Viterbi algorithm that gives the best re-sult of recognition in DHMM. The recognition rates of speaker-independent experiments can reach up to 85.75%. It shows that the system has achieved good performance.