資料選取方法於鑑別式聲學模型訓練之研究

本論文旨在研究使用各種訓練資料選取方法來改善以最小化音素錯誤為基礎的鑑別式聲學模型訓練，並應用於中文大詞彙連續語音辨識。首先，我們汲取Boosting演算法中強調被錯誤分類的訓練樣本之精神，修改最小化音素錯誤訓練中每一句訓練語句之統計值權重，以提高易傾向於被辨識錯誤的語句對於聲學模型訓練之貢獻。同時，我們透過多種方式來結合在不同訓練資料選取機制下所訓練出的多個聲學模型，進而降低語音辨識錯誤率。其次，我們亦提出一個基於訓練語句詞圖之期望音素正確率(Expected Phone Accuracy)定義域上的訓練資料選取方法，分別藉由在語句與音素段落兩種不同單位上的訓練資料選取，以提供最小化音素錯誤訓練更具鑑別資訊的訓練樣本。再者，我們嘗試結合本論文所提出的訓練資料選取方法及前人所提出以正規化熵值為基礎之音框層次訓練資料選取方法、以及音框音素正確率函數，冀以提升最小化音素錯誤訓練之成效。最後，本論文以公視新聞語料作為實驗平台，實驗結果初步驗證了本論文所提出方法之可行性。

關鍵字

資料選取；鑑別式訓練；聲學模型；語音辨識

並列摘要

This thesis aims to investigate various training data selection approaches for improving the minimum phone error (MPE) based discriminative training of acoustic models for Mandarin large vocabulary continuous speech recognition (LVCSR). First, inspired by the concept of the AdaBoost algorithm that lays more emphasis on the training samples misclassified by the already-trained classifier, the accumulated statistics of the training utterances prone to be incorrectly recognized are properly adjusted during the MPE training. Meanwhile, multiple speech recognition systems with their acoustic models respectively trained using various training data selection criteria are combined together at different recognition stages for improving the recognition accuracy. On the other hand, a novel data selection approach conducted on the expected phone accuracy domain of the word lattices of training utterances is explored as well. It is able to select more discriminative training instances, in terms of either utterances or phone arcs, for better model discrimination. Moreover, this approach is further integrated with a previously proposed frame-level data selection approach, namely the normalized entropy based frame-level data selection, and a frame-level phone accuracy function for improving the MPE training. All experiments were performed on the Mandarin broadcast news corpus (MATBN), and the associated results initially demonstrated the feasibility of our proposed training data selection approaches.

並列關鍵字

Data Selection ； Discriminative Training ； Acoustic Models ； Speech Recognition

參考文獻

[陳羿帆 2006] 陳羿帆, “鑑別式解碼應用於多重系統結合之中文大詞彙語音辨識” 國立台灣大學電信工程研究所碩士論文, 2006.

[Kuo et al. 2006] J.-W. Kuo, S.-H. Liu, H.-M. Wang and B. Chen, “An Empirical Study of Word Error Minimization Approaches for Mandarin Large Vocabulary Speech Recognition,” International Journal of Computational Linguistics and Chinese Language Processing, Vol. 11, No.3, pp.201-222, 2006.

[Wang et al. 2005] H.-M. Wang, B. Chen, J.-W. Kuo and S.-S. Cheng, “MATBN: A Mandarin Chinese Broadcast News Corpus,” International Journal of Computational Linguistics and Chinese Language Processing, Vol. 10, No.2, pp.219-236, 2005.

[陳燦輝 2006] 陳燦輝, “信心度評估於中文大詞彙連續語音辨識之研究,” 國立台灣師範大學資訊工程研究所碩士論文, 2006.

[郭人瑋 2005] 郭人瑋, “最小化音素錯誤鑑別式聲學模型學習於中文大詞彙連續語音辨識之初步研究,” 國立台灣師範大學資訊工程研究所碩士論文, 2005.

被引用紀錄

程永任（2008）。最小音素錯誤訓練法及其改進方法在國語大字彙辨識上之評估與分析〔碩士論文，國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU.2008.02662

羅永典（2010）。使用邊際資訊於鑑別式聲學模型訓練〔碩士論文，國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-1610201315213941

丁威群（2011）。基於DTHI技術之三維立體視訊互動系統與晶片設計〔碩士論文，國立臺北科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0006-0108201111371800

國際替代計量

資料選取方法於鑑別式聲學模型訓練之研究

主題瀏覽