利用多任務學習模型建立發音特徵來改善華語錯誤發音偵測與診斷之回饋

此篇論文在探討電腦輔助發音訓練，我們聚焦於錯誤發音偵測以及提供與口腔模型相關的回饋。我們提出加入發音特徵(speech attributes)，像是發音部位-發音方式 (place and manner)，能有助於改善錯誤發音偵測與更為精準地提供錯誤發音診斷。實作上我們利用時間延遲神經網路(time-delay neural networks)並採用多任務學習策略(multi-task learning strategy)訓練了具鑑別力的發音模型並能輸出一個音素的發音分數(articulatory score)，以及利用時間延遲神經網路訓練聲學模型並能輸出一個音素分數(phonetic score)。在測試階段，系統會基於發音分數與音素分數偵測發音錯誤並且給予一個精準的發音改進回饋。此論文實驗採用的語料為公視國語新聞廣播節目 (MATBN)，並利用equal error rate (EER)、diagnosis accuracy (DA)來顯示深度類神經網路-隱式馬可夫模型(DNN-HMM)的表現比高斯混合模型-隱式馬可夫模型(GMM-HMM)來得好。除此之外，我們提出的方法能適用於各種語言，但此篇論文著重於華語的探討。

關鍵字

電腦輔助發音訓練；錯誤發音偵測；錯誤發音診斷；發音部位-發音方式；多任務學習；鑑別性訓練；時間延遲神經網路

並列摘要

This paper presents our research on computer assisted pronunciation training (CAPT). We focus on mispronunciation detection and articulation feedback. We propose taking into account the speech attributes, namely place and manner of articulation, in the assessment models to improve mispronunciation detection and return precise articulation feedback to learners. We train a discriminative articulatory model based on time-delay neural networks (TDNNs) with the multi-task learning strategy to give the articulatory score and a TDNN-based acoustic model to give the phonetic score. In testing, the system detects mispronunciations and returns precise articulation feedback based on both the phonetic and articulatory scores. The results of experiments conducted on the MATBN Mandarin Chinese broadcast news corpus show that the proposed models outperform the Gaussian mixture model (GMM)-based and deep neural network (DNN)-based baselines in terms of equal error rate (EER) and diagnostic accuracy (DA). Furthermore, our mispronunciation detection system should work in any language, although the current system focuses on Mandarin.

並列關鍵字

computer assisted pronunciation training ； mispronunciation detection ； articulatory features ； multi-task learning ； discriminative training ； time-delay neural networks

參考文獻

[1] W Menzel, D Herron, P Bonaventura, and R Morton. Automatic detection and correction of non-native English pronunciation. In Proc. InSTIL, 2000.

Google Scholar

[2] Stephanie Seneff, Chau Wang, and Julia Zhang. Spoken Conversational Interaction for Language Learning. In Proc. InSTIL/ICALL, 2004.

Google Scholar

[3] Helmer Strik, Jozef Colpaert, Joost van Doremalen, and Catia Cucchiarini. The DISCO ASR-based CALL System: Practicing L2 Oral Skills and Beyond. In Proc. LREC, 2012.

Google Scholar

[4] Xiaojun Qian, Helen Meng, and Frank Soong. A Two-Pass Framework of Mispronunciation Detection and Diagnosis for Computer-Aided Pronunciation Training. IEEE/ACM Transactions on Acoustics, Speech, and Signal Processing, 24(6):1020–

Google Scholar

1028, 2016.

Google Scholar

國際替代計量

利用多任務學習模型建立發音特徵來改善華語錯誤發音偵測與診斷之回饋

主題瀏覽