透過您的圖書館登入
IP:13.59.100.42
  • 學位論文

中文大詞彙語音辨認之語言模型改進

Improvement on Language Modeling for Large-vocabulary Mandarin Speech Recognition

指導教授 : 陳信宏

摘要


本研究之目的為探討中文大詞彙語音辨認之語言模型改進。傳統大詞彙語音辨認大多使用統計式語言模型,藉此計算數萬詞條之雙連文或三連文機率模型,然而,此方法仍有其缺失,因其無法對於不包含在辭典中之詞彙進行辨識,其中包含數量複合詞、專有名詞、不常出現之詞綴構詞等等,基此,本研究針對混合詞及半詞(subword)之統計式語言模型進行探討,期望藉此增進辭典之涵蓋率,降低無法進行辨識之詞條數目。 本研究分為三大主軸,首先,對於文字資料庫進行前處理,針對不適當內容(英文、文章標題等)進行刪減、對於錯誤文字予以更正、斷詞、文字正規化等;其次,建構混合詞及半詞統計式語言模型,探討字典收錄詞條之策略、將辭典未收納之詞彙拆解為半詞之方法、以及混合模型之建立,最後,採用兩階段(two-stage)辨認架構,針對辨認方法及實驗結果進行說明,並進一步分析與比較架構式模型和傳統方法模型之語音辨認結果之優劣,針對本研究考量之三種構詞(人名、詞綴及數量複合詞)的辨識效益進行深入分析。 為了驗證提出方法之效能,本研究採用TCC300麥克風語料為語音實驗語料,語言模型則由台灣光華雜誌(Taiwan Paramora)及中文檢索標竿(NTCIR3.0)文字語料庫求得,實驗結果顯示,相較於傳統採用之統計式語言模型,本研究所提出的混合模型對於大詞彙語音辨認系統效能有所改善,整體詞辨認率(word accuracy)由60.86%提升至62.85%,經過深入分析發現,使用所提出之兩階段辨認方法對於人名、詞綴及數量複合詞確實有所幫助,此三類辨認正確之數量增加驗證了提出方法的有效性。

並列摘要


The purpose of this research is the improvement of language modeling for large-vocabulary Mandarin speech recongnition. Traditionally, large-vocabulary speech recognition is almost to employ statistical language model. By calculating million of bigram(or trigram) probability model is also having the drawback. Because we can not recognition the OOV(out-of-vocabulary) words(including determiner-measure compoundi word, name entity, and affix word). Because these reasons, we probe into the statistic language model which mixs word and subword. By this way, we not only hope that increasing the coverage of lexicon, but also decreaing the number of words which we can’t recognition correctly. This thesis divides three parts. First, we explore the applicability of the corpus to be used to build the language model, and to observe the contents of corpus whether fit to build the language model or not. We delete the misfit contents and correct the wrong words. We hope to promote the whole recognition rate. Second, we want to train the statistic language model which mixs word and subword, and probe into the tactics that collect the entiry of recognition lexicon to building language model which have the word and the subword. Finally, we use two-stage framework to recognition, and further analyize the result of two-stage experiment. In order to prove the efficiency of the method, we observed the numer of the class recognized correctly is obviously increasing, and recognition rate is 60.86% upto 62.85%. The phenomenon have identified that this framework is efficient.

參考文獻


【1】B.H.Juang and S.Furui,“Automatic recognition and understanding of spoken language—A first step towards natural human-machine communication,”in Proc IEEE,88,8,pp.1142-1165,2000
【4】Slava M. Katz,“Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer,”IEEE Transactions on Acoustic,Speech and Signal Processing,Vol.ASSP-35,NO.3,MARCH 1987
【5】江振宇(2004)。中文斷詞器之改進。國立交通大學電信工程學系碩士論文。
【7】P.Geutner,“Using Morphology Towards Better Large-Vocabulary Speech Recognition Systems” in: Proc .Int. Conf. on Acoustics, Speech, and Signal Processing, Detroit, pp. 445-448 ,1995
【8】Mathias Creutz,Teemu Hirsimaki,Mikko Kurimo,Antti Puurula,Janne Pylkkonen,Vesa Siivola,Matti Varjokallio,Ebru Arisoy,Murat Saraclar,and Andreas Stolcke,

被引用紀錄


邱子軒(2012)。使用韻律訊息於建立聲學模型之中文語音辨認〔碩士論文,國立交通大學〕。華藝線上圖書館。https://doi.org/10.6842/NCTU.2012.00841
劉銘傑(2011)。以韻律輔助之中文語音辨認系統之實現〔碩士論文,國立交通大學〕。華藝線上圖書館。https://doi.org/10.6842/NCTU.2011.00457

延伸閱讀