利用主題與位置相關語言模型於中文連續語音辨識

本論文探討語言模型於中文連續語音辨識。首先，詞主題混合模型(Word Topical Mixture Model, WTMM)被提出，用來探索詞與詞之間的關係，在語言模型調適中，此關係可當作是長距離的潛藏語意資訊。在語音辨識過程中，歷史詞序列可被建立成一個複合式詞主題混合模型，並用來預測新的辨識詞。此外，位置相關語言模型(Position-Dependent Language Model)亦被提出，使用詞在文件或語句的位置資訊輔助估測詞發生的可能性，並與N連詞模型及潛藏語意分析(Probabilistic Latent Semantic Analysis, PLSA)模型所提供的資訊作整合。最後，針對摘錄式摘要，我們也發展一個機率式句排名架構，其中的語句事前機率透過能夠緊密整合語句資訊的整句最大熵值(Whole Sentence Maximum Entropy, WSME)模型估測。這些資訊從語句中擷取，並可作為語音文件中重要語句的選擇依據。本論文實驗於收集自台灣的中文廣播新聞。語音辨識結果顯示，詞主題混合模型與位置相關語言模型能夠提升大詞彙連續語音辨識系統的效果。此外，語音文件摘要結果也顯示，透過整句最大熵值法整合語句層次資訊能夠提升摘要正確率。

關鍵字

語音辨識；語言模型；語言模型調適；主題相關語言模型；位置相關語言模型

並列摘要

This study investigates language modeling for Mandarin continuous speech recognition. First, a word topical mixture model (WTMM) was proposed to explore the co-occurrence relationship between words, as well as the long-span latent topical information, for language model adaptation. During Speech recognition, the search history is modeled as a composite WTMM model for predicting a newly decoded word. Second, a position-dependent language model was presented to make use of the word positional information within documents and sentences for better estimation of word occurrences. The word positional information was exploited in conjunction with that information provided by the conventional N-gram and probabilistic latent semantic analysis (PLSA) models, respectively. Finally, we also attempted to develop a probabilistic sentence-ranking framework for extractive spoken document summarization, for which the sentence prior probabilities were estimated by the whole sentence maximum entropy (WSME) language model that tightly integrated the extra information clues extracted from the spoken sentences for better selection of salient sentences of a spoken document. The experiments were conducted on Mandarin broadcast news compiled in Taiwan. The speech recognition results revealed that the word topical mixture model and positional dependent language model, respectively, could boost the performance of the baseline large vocabulary continuous speech recognition (LVCSR) system, while the spoken document summarization results also demonstrated that the integration of extra sentence-level information clues through the whole sentence maximum entropy language model could considerably raise the summarization accuracy.

並列關鍵字

Speech Recognition ； Language Model ； Language Model Adaptation ； Topic-Dependent Language Model ； Position-Dependent Language Model

參考文獻

[Wang et al. 2005] H.-M. Wang, B. Chen, J.-W. Kuo and S.-S. Cheng, “MATBN: A Mandarin Chinese Broadcast News Corpus,” International Journal of Computational Linguistics and Chinese Language Processing, Vol. 10, No.2, pp.219-236, 2005.

[陳怡婷 2006] 陳怡婷, "中文語音資訊摘要－模型與特徵之改進," 國立台灣師範大學資訊工程所碩士論文, 2006.

[蔡文鴻 2005] 蔡文鴻, "語言模型訓練與調適技術於中文大詞彙連續語音辨識之初步研究," 國立台灣師範大學資訊工程所碩士論文, 2005

[Lau et al. 1993] Raymond Lau, Ronald Rosenfeld and Salim Roukos. Trigger-Based Language Models: a Maximum Entropy Approach. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pages II 45–48, Minneapolis, MN, April 1993.

[Saul and Pereira 1997] L. Saul and F. Pereira. Aggregate and mixed-order Markov models for statistical language processing. In Proceedings of EMNLP, 1997.

被引用紀錄

劉鳳萍（2009）。使用鑑別式語言模型於語音辨識結果重新排序〔碩士論文，國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-1610201315172646

陳冠宇（2010）。主題模型於語音辨識使用之改進〔碩士論文，國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-1610201315213186

劉家妏（2010）。多種鑑別式語言模型應用於語音辨識之研究〔碩士論文，國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-1610201315213184

賴敏軒（2011）。實證探究多種鑑別式語言模型於語音辨識之研究〔碩士論文，國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-1610201315254524

黃邦烜（2012）。遞迴式類神經網路語言模型使用額外資訊於語音辨識之研究〔碩士論文，國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-1610201315300315

國際替代計量

利用主題與位置相關語言模型於中文連續語音辨識

主題瀏覽