透過您的圖書館登入
IP:3.145.42.94
  • 學位論文

實證探究多種鑑別式語言模型於語音辨識之研究

Empirical Comparisons of Various Discriminative Language Models for Speech Recognition

指導教授 : 陳柏琳
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


語言模型(Language Model)在自動語音辨識(Automatic Speech Recognition, ASR)系統中扮演相當重要的角色,藉由使用大量的訓練文字來估測其相對應的模型參數,以描述自然語言的規律性。N-連(N-gram)語言模型(特別是雙連詞(Bigram)與三連詞(Trigram))常被用來估測每一個詞出現在已知前N-1個歷史詞之後的條件機率。此外,N-連模型大多是以最大化相似度為訓練目標,對於降低語音辨識錯誤率常會有所侷限,並非能達到最小化辨識錯誤率。近年來為了解決此問題,鑑別式語言模型(Discriminative Language Model, DLM)陸續地被提出,目的為從可能的辨識語句中正確地區別最佳的語句作為辨識之結果,而不是去符合其訓練資料,此概念已經被提出並論證有一定程度的成果。本論文首先實證探討多種以提升語音辨識效能為目標的鑑別式語言模型。接著,我們提出基於邊際(Margin-based)鑑別式語言模型訓練方法,對於被錯誤辨識的語句根據其字錯誤率(Word Error Rate, WER)與參考詞序列(字錯誤率最低)字錯誤率之差為比重,給予不同程度的懲罰。相較於其它現有的鑑別式語言模型,我們所提出的方法使用於大詞彙連續語音辨識(Large Vocabulary Continuous Speech Recognition, LVCSR)時有相當程度的幫助。

並列摘要


Language modeling (LM), at the heart of most automatic speech recognition (ASR) systems, is to render the regularity of a given natural language, while it corresponding model parameters are estimated on the basis of a large amount of training text. The n-gram (especially the bigram and trigram) language models, which determine the probability of a word given the preceding n-1 word history, are most prominently used. The n-gram model, normally trained with the maximum likelihood (ML) criterion, are not always capable of achieving minimum recognition error rates which in fact are closely connected to the final evaluation metric. To address this problem, in the recent past, a range of discriminative language modeling (DLM) methods, aiming at correctly discriminate the recognition hypotheses for the best recognition results rather than just fitting the distribution of training data, have been proposed and demonstrated with varying degrees of success. In this thesis, we first present an empirical investigation of a few leading DLM models designed to boost the speech recognition performance. Then, we propose a novel use of various margin-based DLM training methods that penalize incorrect recognition hypotheses in proportion to their WER (word error rate) distance from the desired hypothesis (or the oracle) that has the minimum WER. Experiments conducted on a large vocabulary continuous speech recognition (LVCSR) task illustrate the performance merits of the methods instantiated from our DLM framework when compared to other existing methods.

並列關鍵字

無資料

參考文獻


[陳冠宇,2010] 陳冠宇,“主題模型於語音辨識使用之改進,”國立臺灣師範大學資訊工程所碩士論文,2010。
[劉家妏,2010] 劉家妏,”多種鑑別式語言模型應用於語音辨識之研究,” 國立臺灣師範大學資訊工程所碩士論文,2010。
[劉鳳萍,2009] 劉鳳萍,“使用鑑別式言模型於語音辨識結果重新排序,”國立臺灣師範大學資訊工程所碩士論文,2009。
[邱炫盛,2007] 邱炫盛,”利用主題與位置相關語言模型於中文連續語音辨識,”國立臺灣師範大學資訊工程所碩士論文,2007。
[Clarkson and Robinson, 1997] P. R. Clarkson and A. J. Robinson, “Language model adaptation using mixtures and an exponentially decaying cache,” in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 799-802 1997.

被引用紀錄


黃邦烜(2012)。遞迴式類神經網路語言模型使用額外資訊於語音辨識之研究〔碩士論文,國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-1610201315300315

延伸閱讀