多種鑑別式語言模型應用於語音辨識之研究

N連(N-gram)語言模型在語音辨識器中扮演著關鍵性的角色，因為它可幫助辨識器從其大量輸出的候選詞序列中，區分出正確與非正確的候選詞序列。然而，因N連語言模型的訓練目標為最大化訓練語料的機率，而不是以最佳化語音辨識評估量為目標，導致在語音辨識效能表現上有所侷限。本論文我們首先探討多種基於不同訓練目標的鑑別式語言模型(Discriminative Language Model, DLMs)。鑑別式語言模型的根本精神即為直接提昇語音辨識效能；接著會比較它們在理論與實際上運用在大詞彙語音辨識上的表現。另外，我們也提出語句相關之鑑別式語言模型(Utterance-driven Discriminative Language Model, UDLM)，此語言模型可考慮測試語句的特性，並即時估計其模型參數。最後，我們將最大化事後機率法(Maximum a Posterior, MAP)結合語句相關之鑑別式語言模型，期望最大化事後機率法所產生的辨識結果，能幫助語句相關之鑑別式語言模型獲致更顯著的語音辨識率提昇。本論文的實驗皆建立在臺灣中文廣播新聞語料上，實驗結果顯示本論文所提出之作法可獲得一定的語音辨識率提升。

關鍵字

語音辨識；語言模型；鑑別式語言模型；重新排序

並列摘要

N-gram language modeling is a crucial component in any speech recognizer since it is expected to help the recognizer distinguish the correct hypothesis from the other incorrect ones in an extremely large output space of the recognizer. However, the N-gram language models are inadequate since they usually set the goal of training at maximizing the likelihood of a large amount of training text, but not at optimizing the final performance measure of speech recognition. In this thesis, we first investigate a wide variety of discriminative language models (DLMs), which have their roots stemming from different training objectives but are consistent with the intuition of enhancing recognition performance. The utilities of these DLMs are compared both theoretically and empirically. Further, we also propose a test utterance-driven DLM (UDLM) that can efficiently infer its model parameters on-the-fly and accommodate itself well to speech recognition applications. As a final point, we pair UDLM with the maximum a posteriori probability (MAP) language model adaptation approach for better recognition performance. All experiments are conducted on a Mandarin broadcast news corpus compiled in Taiwan, and the associated results seem to demonstrate the feasibility of the proposed methods.

並列關鍵字

無資料

參考文獻

[Arisoy et al. 2010] E. Arisoy, M. Saraclar, B. Roark, and I. Shafran, “Syntactic and sub-lexical features for Turkishi discriminative language models,” ICASSP, 2010.

[Bahl et al. 1986] L.R. Bahl, P.F. Brown, P.V. de Souza, and L.R. Mercer, “Maximum mutual information estimation of Hidden Markov Model parameters for speech recognition,” ICASSP, 1986.

[Bahl et al. 1983] L. R. Bahl, F. Jelinek and R. L. Mercer, “A Maximum Likelihood Approach to Continuous Speech Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, 1983.

[Bellegarda 2005] J. R. Bellegarda, “Latent Semantic Mapping,” IEEE Signal Processing Magazine, Vol. 22. No. 5, pp. 70- 80, 2005.

[Chelba and Jelinek 2000] C. Chelba and F. Jelinek, “Structured language modeling,” Computer Speech and Language, 2000.

被引用紀錄

賴敏軒（2011）。實證探究多種鑑別式語言模型於語音辨識之研究〔碩士論文，國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-1610201315254524

黃邦烜（2012）。遞迴式類神經網路語言模型使用額外資訊於語音辨識之研究〔碩士論文，國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-1610201315300315

國際替代計量

多種鑑別式語言模型應用於語音辨識之研究

主題瀏覽