透過您的圖書館登入
IP:3.140.185.147
  • 期刊
  • OpenAccess

使用概念資訊於中文大詞彙連續語音辨識之研究

Exploring Concept Information for Mandarin Large Vocabulary Continuous Speech Recognition

摘要


語言模型是語音辨識系統中的關鍵組成,其主要的功能通常是藉由已解碼的歷史詞序列資訊來預測下一個詞彙為何的可能性最大,以協助語音辨識系統從眾多混淆的候選詞序列假設中找出最有可能的結果。本論文旨在於發展新穎動態語言模型調適技術,用以輔助並彌補傳統N連(N-gram)語言模型不足之處,其主要貢獻有二。首先,我們提出所謂的概念語言模型(Concept Language Model,CLM),其主要目的在於近似隱含在歷史詞序列中語者內心所欲表達之概念,並藉以獲得基於此概念下詞彙使用分布資訊,做為動態語言模型調適之線索來源。其次,我們嘗試以不同方式來估測此種概念語言模型,並將不同程度的鄰近資訊(Proximity Information)融入概念語言模型以放寬其既有詞袋(Bag-of-Words)假設的限制。本論文是以中文大詞彙連續語音辨識(LargeVocabulary Continuous Speech Recognition, LVCSR)為任務目標,以比較我們所提出語言模型調適技術與其它當今常用技術之效能。實驗結果顯示我們的語言模型調適技在以字錯誤率(Character Error Rate, CER)評估標準之下,對於僅使用N連語言模型的基礎語音辨識系統皆能有明顯的效能提升。

並列摘要


Language modeling (LM) is part and parcel of automatic speech recognition (ASR), since it can assist ASR to constrain the acoustic analysis, guide the search through multiple candidate word strings, and quantify the acceptability of the final output hypothesis given an input utterance. This paper investigates and develops language model adaptation techniques for use in ASR and its main contribution is two-fold. First, we propose a novel concept language modeling (CLM) approach to rendering the relationships between a search history and an upcoming word. Second, the instantiations of CLM are constructed with different levels of lexical granularities, such as words and document clusters. In addition, we also explore the incorporation of word proximity cues into the model formulation of CLM, getting around the "bag-of-words" assumption. A series of experiments conducted on a Mandarin large vocabulary continuous speech recognition (LVCSR) task demonstrate that our proposed language models can offer substantial improvements over the baseline N-gram system, and achieve performance competitive to, or better than, some state-of-the-art language model adaptation methods.

參考文獻


Wang, H.-M.,Chen, B.,Kuo, J.-W.,Cheng, S.-S.(2005).MATBN: a Mandarin Chinese broadcast news corpus.International Journal of Computational Linguistics & Chinese Language Processing.10(1),219-235.
Chen, B.,Kuo, J.-W.,Tsai, W.-H.(2004).Lightly supervised and data-driven approaches to Mandarin broadcast news transcription.Proceedings of the IEEE International Conference on Acoustics, Speech, Signal Processing.(Proceedings of the IEEE International Conference on Acoustics, Speech, Signal Processing).
Stolcke, A. (2000). SRI Language Modeling Toolkit. Available at: http://www.speech.sri.com/projects/srilm/.
Baeza-Yates, R.,Ribeiro-Neto, B.(2011).Modern Information Retrieval: the Concepts and Technology behind Search.Addison-Wesley Professional.
Bellegarda, J. R.(2004).Statistical language model adaptation: review and perspectives.Speech Communication.42(11),93-108.

延伸閱讀