考慮語速影響與詞綴構詞之中文語音辨認系統

本研究提出一個新的中文大詞彙連續語音辨認方法來考慮綴詞的辨認及語速對辨認的影響，首先針對綴詞，從構詞學的角度出發，利用綴詞具有的規則特性將它們拆解成sub-word單元，再建構出一個詞群語言模型來描述它們和其他詞的關係，研究目標在於藉由增加word lattice的正確詞涵蓋率來降低OOV(out-of-vocabulary)的影響，實驗結果顯示可以降低詞(word)、字(character)及基本音節(base-syllable)的絕對錯誤率分別達到0.37%、0.27%及0.26% (或是降低相對錯誤率達到2.64%、2.56%及3.38%)；其次，本論文探討語速對語音辨認的影響，做法是藉由建立一個語速控制的階層式韻律模型來描述語速對語音韻律聲學參數的影響，並將其用來協助語音辨認。實驗結果顯示所提出的考慮語速的語音辨認方法可以降低詞、字及基本音節的絕對錯誤率分別達到1.67%、1.45%及1.02% (或是降低相對錯誤率達到12.25%、14.09%及13.55%)，因此這是一個不錯的方法。

關鍵字

自動語音辨識；構詞；韻律模型

並列摘要

The thesis presents a new Mandarin-speech recognition approach to considering the recognition of affix-words and the effect of speaking rate. First, the recognition of affix-words is realized via decomposing them into sub-word units. A class-based language model is then employed to describe their relations with other words. The study aims at decreasing the effect of out-of-vocabulary (OOV) words by increasing the coverage of the word lattice generated by a lexicon with size limited to 60,000. Experimental results showed the reductions of word, character, and base-syllable error rates by 0.37%, 0.27% and 0.26% absolutely (or 2.64%, 2.56%, and 3.38% relatively). Then, the effect of speaking rate on speech recognition is discussed. A speaking rate-dependent hierarchical prosody model which describes the influences of speaking rate on prosodic-acoustic features are constructed and used to assist in speech recognition. Experimental results showed that the approach of considering speaking rate in ASR leads to the reductions of word, character, and base-syllable error rates by 1.67%, 1.45% and 1.02% absolutely (or 12.25%, 14.09%, and 13.55% relatively). So, the proposed approach is very promising.

並列關鍵字

Automatic Speech Recognition ； Word Construction ； Prosody Model

參考文獻

【2】 Chien-Pang Chou, “Improvement on Language Modeling for Large-Vocabulary Mandarin Speech Recognition,” NCTU Speech Processing Lab, 2009

【3】 Yun-Shu Yang, “Large-Vocabulary Mandarin Speech Recognition using Hierarchical Language Model,” NCTU Speech Processing Lab, 2010

【4】 Matthew A. Siegler and Richard M. Stern “On The Effects of Speech Rate in Large Vocabulary Speech Recognition Systems”

【5】 F. Martinez, D. Tapias and J. Alvarez “Towards Speech Rate Independence in Large Vocabulary Continuous Speech Recognition”

【6】 T. Pfau, R.Faltlhauser, and G. Ruske “A Combination of Speaker Normalization and Speech Rate Normalization for Automatic Speech Recognition”

國際替代計量

考慮語速影響與詞綴構詞之中文語音辨認系統

全文下載

主題瀏覽