即時語音辨識多媒體系統

本論文主要是開發一套即時辨識多媒體系統，整合在車上常用的功能，提供簡單但實用的服務，配合自動錄音的技術，即時偵測指令的下達與否；使用關鍵詞萃取的技術來判斷屬於哪種服務，此技術是使用訓練好的次音節模型來辨識，無需因為更改服務而重複訓練模型，提升辨識效率與系統移植性。系統採用階層式架構，漸進式的引導使用者熟悉本系統，配合語音合成技術(Text To Speech, TTS)模擬人聲與使用者互動，系統開發工具是使用Borland C++ 6.0來實現視窗化的人機介面，達到即時辨識的效果。

關鍵字

隱藏式馬可夫模型；關鍵字擷取

並列摘要

This thesis develops a real-time voice recognition multimedia system to provide simple but useful services. System detects whether commands were made or not by using automatic recording technology, then determining what kind of service is with keyword spotting technology. This technology implements recognition with sub-syllable models, which don’t need to repeat training, to improve the performance efficiency and portability. System uses a hierarchical structure for keyword spotting with TTS (Text To Speech) to let user familiar with system. The system achieved by the Borland C + + 6.0 Windows based interface to realize real-time recognition.

並列關鍵字

Hidden Markov Model ； keyword spotting

參考文獻

[5] H. Ney, “The use of a one stage dynamic programming algorithm for connected word recognition,” IEEE Trans. on Acoustic, Speech Signal, Processing, vol. 32, no. 2, pp. 263-271, April 1984.

[6] W. Jhing-Fa, W. Chung-Hsien, H. Chaug-Ching, and L. Jau-Yien, “Integrating Neural Nets and One-Stage Dynamic Programming for Speaker Independent Continuous Mandarin Digit Recognition,” Acoustics, Speech, and Signal Processing, 1991, vol. 1, pp. 69-72, Apr 1991.

[9] John R. Deller, Jr. , John G Proakis, and John H. L. Hansen, Discrete-Time Processing of Speech Signals, 1987.

[11] Q. Li, A. Tsai, Jinsong Zheng and Qiru Zhou, “Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition,” IEEE, Transations on Speech and Audio Processing, vol. 10, no.3, March 2002.

[15] R. Vergin, D. O’Shaughnessy, and A. Farhat, “Generalized Mel Frequency Coefficients for Large-Vocabulary Speaker-Independent Continuous-Speech Recognition,”IEEE Trans. Speech and Audio Processing, vol. 7, no. 5, pp. 525-532, September 1999.

國際替代計量

即時語音辨識多媒體系統

未授權

主題瀏覽