主題模型於語音辨識使用之改進

本論文探討自然語言中詞與詞之間在各種不同條件下的共同出現關係，並推導出許多不同的語言模型來描述之，進而運用於中文大詞彙連續語音辨識。當我們想要探索語言中兩個詞彼此間的共同出現關係(Co-occurrence Relationships)，傳統的做法是由整個訓練語料中統計這兩個詞在一個固定長度的移動窗(Fixed-size Moving Window)內的共同出現頻數(Frequency)，據此以估測出兩個詞之間的聯合機率分布。有別於僅從整個訓練語料中的共同出現頻數來推測任兩個詞之間的關係，本論文嘗試分析兩個詞在不同條件下共同出現的情形，進而推導出多種描述詞與詞關係的語言模型以及其估測方式；像是在不同的主題、文件或文件群的情況下，它們是否皆經常共同出現。本論文的實驗語料收錄自台灣的中文廣播新聞，由一系列的大詞彙連續語音辨識實驗結果顯示，我們所提出的各式語言模型皆可以明顯地提昇基礎語音辨識系統的效能。

關鍵字

中文大詞彙連續語音辨識；共同出現關係；語言模型

並列摘要

This thesis investigates word-word co-occurrence relationships embedded in a natural language. A variety of language models deduced from such relationships are leveraged for Mandarin large vocabulary continuous speech recognition (LVCSR). When measuring the co-occurrence relationship between a given pair of words in a language, the most common approach is to estimate the joint probability of these two words by simply computing how many times the two words occur within some fixed-size window of each other that moves along the entire training corpus. Apart from doing this, in this study, we discuss the co-occurrence relationships between any pair of words under various conditions such as topics, documents, document clusters, to name a few, and hence derive several language models used to characterize such relationships. All experiments are conducted on a Mandarin broadcast news corpus compiled in Taiwan, and the associated results seem to demonstrate the feasibility of the proposed approaches.

並列關鍵字

large vocabulary continuous speech recognition ； co-occurrence relationships ； language model

參考文獻

[Wang et al. 2005] H.-M. Wang, B. Chen, J.-W. Kuo and S.-S. Cheng, “MATBN: A Mandarin Chinese Broadcast News Corpus,” International Journal of Computational Linguistics and Chinese Language Processing, Vol. 10, No.2, pp.219-236, 2005.

[Tsai and Chen 2004] Y.-F. Tsai and K.-J. Chen, “Reliable and Cost-Effective Pos-Tagging”, International Journal of Computational Linguistics & Chinese Language Processing, Vol. 9 #1, pp83-96, 2004.

[陳燦輝 2006] 陳燦輝, “信心度評估於中文大詞彙連續語音辨識之研究,” 國立台灣師範大學資訊工程所碩士論文, 2006.

[蔡文鴻 2005] 蔡文鴻, “語言模型訓練與調適技術於中文大詞彙連續語音辨識之初步研究,” 國立台灣師範大學資訊工程所碩士論文, 2005.

[劉鳳萍 2009] 劉鳳萍, “使用鑑別式語言模型於語音辨識結果重新排序,” 國立台灣師範大學資訊工程所碩士論文, 2009.

被引用紀錄

賴敏軒（2011）。實證探究多種鑑別式語言模型於語音辨識之研究〔碩士論文，國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-1610201315254524

黃邦烜（2012）。遞迴式類神經網路語言模型使用額外資訊於語音辨識之研究〔碩士論文，國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-1610201315300315

國際替代計量

主題模型於語音辨識使用之改進

主題瀏覽