使用概念資訊於中文大詞彙連續語音辨識之研究

語言模型是語音辨識系統中的關鍵組成，其主要的功能通常是藉由已解碼的歷史詞序列資訊來預測下一個詞彙為何的可能性最大，以協助語音辨識系統從眾多混淆的候選詞序列假設中找出最有可能的結果。本論文旨在於發展新穎動態語言模型調適技術，用以輔助並彌補傳統N連（N-gram）語言模型不足之處，其主要貢獻有二。首先，我們提出所謂的概念語言模型（Concept Language Model,CLM），其主要目的在於近似隱含在歷史詞序列中語者內心所欲表達之概念，並藉以獲得基於此概念下詞彙使用分布資訊，做為動態語言模型調適之線索來源。其次，我們嘗試以不同方式來估測此種概念語言模型，並將不同程度的鄰近資訊（Proximity Information）融入概念語言模型以放寬其既有詞袋（Bag-of-Words）假設的限制。本論文是以中文大詞彙連續語音辨識（LargeVocabulary Continuous Speech Recognition, LVCSR）為任務目標，以比較我們所提出語言模型調適技術與其它當今常用技術之效能。實驗結果顯示我們的語言模型調適技在以字錯誤率（Character Error Rate, CER）評估標準之下，對於僅使用N連語言模型的基礎語音辨識系統皆能有明顯的效能提升。

關鍵字

語音辨識；語言模型；概念資訊；模型調適

並列摘要

Language modeling (LM) is part and parcel of automatic speech recognition (ASR), since it can assist ASR to constrain the acoustic analysis, guide the search through multiple candidate word strings, and quantify the acceptability of the final output hypothesis given an input utterance. This paper investigates and develops language model adaptation techniques for use in ASR and its main contribution is two-fold. First, we propose a novel concept language modeling (CLM) approach to rendering the relationships between a search history and an upcoming word. Second, the instantiations of CLM are constructed with different levels of lexical granularities, such as words and document clusters. In addition, we also explore the incorporation of word proximity cues into the model formulation of CLM, getting around the ＂bag-of-words＂ assumption. A series of experiments conducted on a Mandarin large vocabulary continuous speech recognition (LVCSR) task demonstrate that our proposed language models can offer substantial improvements over the baseline N-gram system, and achieve performance competitive to, or better than, some state-of-the-art language model adaptation methods.

並列關鍵字

Speech Recognition ； Language Model ； Concept Information ； Model Adaptation

參考文獻

Wang, H.-M.,Chen, B.,Kuo, J.-W.,Cheng, S.-S.(2005).MATBN: a Mandarin Chinese broadcast news corpus.International Journal of Computational Linguistics & Chinese Language Processing.10(1),219-235.

Chen, B.,Kuo, J.-W.,Tsai, W.-H.(2004).Lightly supervised and data-driven approaches to Mandarin broadcast news transcription.Proceedings of the IEEE International Conference on Acoustics, Speech, Signal Processing.(Proceedings of the IEEE International Conference on Acoustics, Speech, Signal Processing).

Stolcke, A. (2000). SRI Language Modeling Toolkit. Available at: http://www.speech.sri.com/projects/srilm/.

Google Scholar

Baeza-Yates, R.,Ribeiro-Neto, B.(2011).Modern Information Retrieval: the Concepts and Technology behind Search.Addison-Wesley Professional.

Google Scholar

Bellegarda, J. R.(2004).Statistical language model adaptation: review and perspectives.Speech Communication.42(11),93-108.

Google Scholar

國際替代計量

使用概念資訊於中文大詞彙連續語音辨識之研究

全文下載

主題瀏覽