最小化音素錯誤鑑別式聲學模型學習於中文大詞彙連續語音辨識之初步研究

近來，有不少文獻針對鑑別式聲學模型訓練加以研究改進，本論文則延伸最小化音素錯誤(Minimum Phone Error, MPE)聲學模型訓練及調適，並使之應用在中文大詞彙連續語音辨識上。本論文以公視新聞外場記者語料作為實驗平台，在實驗中，先對聲學模型進行最大化相似度(Maximum Likelihood, ML)聲學模型訓練，再來則比較最小化音素錯誤與最大化交互資訊(Maximum Mutual Information, MMI)兩種鑑別式訓練，最小化音素錯誤訓練相較於最大化相似度訓練能大幅降低15.52%的相對音節錯誤率、12.33%的相對字錯誤率及10.02%的相對詞錯誤率，明顯優於最大化交互資訊的訓練方式。此外，在非監督式聲學模型調適上，本論文探討了在聲學模型空間及特徵空間上透過轉換矩陣間接調適的調適技術。然而，因為缺少正確轉譯文句(Correct Transcripts)可供最小化音素錯誤估測原始正確率，故需以辨識所產生對應的轉譯文句來取代，使得非監督式最小化音素錯誤調適技術無法對聲學模型參數做良好的估測，導致辨識效能顯著地下降。為了改善此現象，本論文提出了「原始正確率預測模型」(Raw Accuracy Prediction Model, RAPM)用來改良非監督式最小化音素錯誤之調適，對辨識效能有少許的提升。

關鍵字

最小化音素錯誤；大詞彙連續語音辨識；聲學模型訓練；聲學模型調適；最大化交互資訊

並列摘要

Discriminative training of acoustic models has been an active focus of much current research in automatic speech recognition (ASR) in the past few years. This thesis extensively investigated the use of the Minimum Phone Error (MPE) approaches for discriminative training and adaptation of acoustic models for Mandarin large vocabulary continuous speech recognition (LVCSR). All experiments were carried out on the Mandarin broadcast news corpus (MATBN). The experimental results show that MPE training can give significant improvements over the baseline systems whose acoustic models were trained based on the Maximum Likelihood (ML), Maximum Mutual Information (MMI) principles. Comparing to the ML-trained acoustic models, relative reductions of 15.52% syllable error rate (SER), 12.33% character error rate (CER) and 10.02% word error rate (WER) were respectively obtained by using the MPE-trained models. Moreover, unsupervised adaptation of acoustic models via the MPE-trained linear transformation in either the model space or the feature space was studied as well with promising results indicated. However, because there was no correct reference transcript that can be used for accuracy calculation and only the top one automatic transcript can be used instead, the unsupervised MPE-based adaptation techniques may not always accumulate good estimates for the acoustic model parameters and thus their performance will be substantially degraded. To tackle this problem, in this thesis a novel Raw Accuracy Prediction Model (RAPM) was proposed to ameliorate the MPE-based adaptation techniques and slight performance gains were initially demonstrated.

並列關鍵字

MPE ； LVCSR ； Acoustic Model Training ； Acoustic Model Adaptation ； MMI

被引用紀錄

程永任（2008）。最小音素錯誤訓練法及其改進方法在國語大字彙辨識上之評估與分析〔碩士論文，國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU.2008.02662

陳佳妤（2006）。最小音素錯誤模型及特徵訓練法於中文大詞彙辨識上之初步研究〔碩士論文，國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU.2006.00844

陳羿帆（2006）。鑑別式解碼應用於多重系統結合之中文大詞彙語音辨識〔碩士論文，國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU.2006.00725

劉士弘（2006）。改善鑑別式聲學模型訓練於中文連續語音辨識之研究〔碩士論文，國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-0712200716112485

朱芳輝（2007）。資料選取方法於鑑別式聲學模型訓練之研究〔碩士論文，國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-0204200815535282

國際替代計量

最小化音素錯誤鑑別式聲學模型學習於中文大詞彙連續語音辨識之初步研究

主題瀏覽