透過您的圖書館登入
IP:18.118.171.20
  • 學位論文

結合韻律特徵與聲學特徵於錯誤發音檢測與診斷之研究

Mispronunciation Detection and Diagnosis Combining Prosodic Features and Phonetic Features

指導教授 : 陳柏琳
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


本論文探討韻律特徵應用多任務深層網路模型於錯誤發音檢測及診斷(mispronunciation detection and diagnosis, MDD)之研究。電腦輔助發音訓練(computer assisted pronunciation training, CAPT)之目的在於透過電腦自動地指正外語學習者的發音問題;其在程序上大致可分為錯誤發音檢測(mispronunciation detection)與錯誤發音診斷(mispronunciation diagnosis)等兩個階段。本論文主要探討 1.)韻律特徵與聲學特徵結合後對於錯誤發音檢測與診斷的幫助。 2.)希望利用多任務深層網路模型解決資料正例反例不平衡之問題。 3.)結合基於相似度的評分(likelihood-based scoring,GOP)以及基於分類器評分(classification-based scoring)的方法達到更好的檢測結果以及診斷結果。 實驗結果顯示,聲學特徵對於錯誤發音檢測任務較有幫助;而韻律特徵對錯誤發音診斷任務有較好的助益。

並列摘要


The main idea of this thesis is to discuss the assists of the multi-task deep neural network model and prosody characteristics in mispronunciation detection and diagnosis (MDD). The purpose of computer assisted pronunciation training (CAPT) is to help second-language (L2) learners automatically correcting the mistaken pronunciation. Computer assisted pronunciation training can be divided into mispronunciation detection and mispronunciation diagnosis. This paper mainly focuses on three aspects. First, we explore the benefits using the combined features of prosodic and phonetic characteristic in mispronunciation detection and diagnosis task. Second, we use multi-task learning models to help solving the data unbalanced problem. Last but not least, we combine likelihood-based scoring (GOP) method and classification-based scoring method in order to achieve better detection and diagnosis results. The result of experiments shows that phonetic features work better when we need to detect the mispronunciation. On the contrary, prosodic features are more helpful to mispronunciation diagnosis task.

參考文獻


[Atal, 1974] B. S. Atal, “Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification,” The Journal of the Acoustical Society of America, vol. 55, no. 6, pp. 1304–1312, 1974.
[Bergstra et al., 2010] J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. W. Farley and Y. Bengio. “Theano: A CPU and GPU math expression compiler,” in Proceedings of the Python for Scientific Computing Conference, 2010.
[Bishop, 2006] C.M. Bishop, Pattern Recognition and Machine Learning. Springer, 2006.
[Black et al., 2015] M. P. Black, D. Bone, Z. I. Skordilis, R. Gupta, W. Xia, P. Papadopoulos, S. N. Chakravarthula, B. Xiao, M. V. Segbroeck, J. Kim, P. G. Georgiou and S. S. Narayanan, ”Automated evaluation of non-native English pronunciation quality: combining knowledge- and data-driven features at multiple time scales,” in Proceedings of the International Conference on Speech Communication and Technology, 2015.
[Brefeld et al., 2005] U. Brefeld, C. Buscher and T. Scheffer, “Multiview dicriminative sequential learning,” in Proceedings of the European Conference on Machine Learning, 2005.

延伸閱讀