語音輸入操作電腦為語音訊號處理之重要應用,本研究以標記傳遞(Token Passing)模式建構一關鍵詞擷取之語音辨識系統,此系統在Windows 2000 作業系統下,以Microsoft Visual C++ 6.0為系統發展之平台。 本論文使用梅爾倒頻譜係數(MFCC, Mel-Frequency Cepstrum Coefficient)求取特徵參數,連續型隱藏式馬可夫(CHMM, Continuous Hidden Markov Models)建立聲學模型,此外以右相關次音節(Right Context Dependent Sub-Syllabic)模型組成中文415個單音節模型,其中包含113個右相關聲母及39個韻母,聲母以3個狀態,韻母以4個狀態模擬,並建構一靜音模型及短暫停留模型均以4個狀態模擬,使用標記傳遞模式建構出填充模型模式擷取關鍵詞,使用發音確認(Utterance Verification)技術驗證是否為正確之關鍵詞。 最後設計一套以語音輸入為操作介面之選課系統,以階層式方式降低關鍵詞數,以文字轉語音之代理人(Agent)引導使用者,並使用使用確定及拒絕語句以提升關鍵詞之擷取率。
Using speech signal as input to manipulate computer is an important application of speech signal processing research. The system described in our study, is implemented by using Token Passing Model with keyword spotting under Microsoft Visual C++ 6.0 and Windows 2000 operation system. The system utilize the coefficient of Mel-Frequency Cepstrum as the feature parameter, then use the method of CHMM (Continuous Hidden Markov Model) to establish acoustic model. The 415 syllables in Mandarin are further decomposed into right context dependent sub-syllabic units, which are 113 Right Context Dependent INITIAL and 39 Context Independent FINAL. The INITIAL/FINAL are represented by 3-state/4-state. In addition, build a silence and short pause acoustic model, which are represented by 4-state. Then the system uses Token Passing to build a filler model keyword spotting system, and uses the Utterance Verification technology to verify the utterance correct or incorrect. Finally, the research develops a speech input interface system and uses hierarchical architecture method to reduce the number of keyword and uses agent of Text To Speech to lead the users. The system also uses positive and negative sentence to promote the detection rate of keyword spotting.