An Empirical Study of Word Error Minimization Approaches for Mandarin Large Vocabulary Continuous Speech Recognition

This paper presents an empirical study of word error minimization approaches for Mandarin large vocabulary continuous speech recognition (LVCSR). First, the minimum phone error (MPE) criterion, which is one of the most popular discriminative training criteria, is extensively investigated for both acoustic model training and adaptation in a Mandarin LVCSR system. Second, the word error minimization (WEM) criterion, used to rescore N-best word strings, is appropriately modified for a Mandarin LVCSR system. Finally, a series of speech recognition experiments is conducted on the MATBN Mandarin Chinese broadcast news corpus. The experiment results demonstrate that the MPE training approach reduces the character error rate (CER) by 12% for a system initially trained with the maximum likelihood (ML) approach. Meanwhile, for unsupervised acoustic model adaptation, MPE-based linear regression (MPELR) adaptation outperforms conventional maximum likelihood linear regression (MLLR) in terms of CER reduction. When the WEM decoding approach is used for N-best rescoring, a slight performance gain over the conventional maximum a posteriori (MAP) decoding method is also observed.

並列關鍵字

Broadcast News ； Continuous Speech Recognition ； Discriminative Training ； Minimum Phone Error ； Word Error Minimization

參考文獻

Chen, B.,J.-W. Kuo,W.-H. Tsai(2005).Lightly Supervised and Data-Driven Approaches to Mandarin Broadcast News Transcription.International Journal of Computational Linguistics and Chinese Language Processing.10(1),1-18.

Chien, J.-T.,C.-H. Huang,K. Shinoda,S. Furui(2006).Towards Optimal Bayes Decision for Speech Recognition.Proc. ICASSP'06.

Google Scholar

Doumpiotis, V.,S. Tsakalidis,W. Byrne(2003).Discriminative Training for Segmental Minimum Bayes Risk Decoding.Proc. ICASSP'03.

Google Scholar

Doumpiotis, V.,S. Tsakalidis,W. Byrne(2003).Lattice Segmentation and Minimum Bayes Risk Discriminative Training.Proc. Eurospeech'03.

Google Scholar

Doumpiotis, V.,W. Byrne(2004).Pinched Lattice Minimum Bayes Risk Discriminative Training for Large Vocabulary Continuous Speech Recognition.Proc. ICSLP'04.

Google Scholar

被引用紀錄

牛學文（2007）。最小化音素錯誤鑑別式訓練法則應用於華語語者調適之研究〔碩士論文，國立清華大學〕。華藝線上圖書館。https://doi.org/10.6843/NTHU.2007.00211

許碩斌（2006）。最小音素錯誤鑑別式訓練法則應用於連續音素辨識系統之初步研究〔碩士論文，國立清華大學〕。華藝線上圖書館。https://doi.org/10.6843/NTHU.2006.00239

林宥余（2010）。使用取樣點式聲學參數之音素分段〔碩士論文，國立交通大學〕。華藝線上圖書館。https://doi.org/10.6842/NCTU.2010.00591

Syu, Y. C. (2015). 可獨立動態調整時脈之異質多核系統上的節能批次工作排程 [master's thesis, National Taiwan University]. Airiti Library. https://doi.org/10.6342/NTU.2015.11301

蔡文鴻（2004）。語言模型訓練與調適技術於中文大詞彙連續語音辨識之初步研究〔碩士論文，國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-2004200710361675

國際替代計量

An Empirical Study of Word Error Minimization Approaches for Mandarin Large Vocabulary Continuous Speech Recognition

全文下載

主題瀏覽