  • 期刊
  • OpenAccess


Evaluation Metric-related Optimization Methods for Mandarin Mispronunciation Detection


錯誤發音檢測(Mispronunciation Detection)與錯誤發音診斷(Mispronunciation Diagnosis)為電腦輔助發音訓練系統的一部分,它們能輔助第二外語學習者準確地找出語句中錯誤發音的部位以增進學習者的口說熟練度。本論文延續過去學者的研究,大致可將貢獻分為三點:1) 比較不同的發音分數做為錯誤發音檢測的評估依據,並探討對於錯誤發音檢測效能的影響;2) 我們透過最佳化評估尺度相關訓練法則估測深層類神經網路聲學模型的參數以及發音檢測決策函數之參數;3) 使用F1 度量作為目標函數時,若將二類的F1 度量線性組合並調整權重,可有效處理資料類別不平衡的問題。一系列的實驗將建立在華語錯誤發音檢測與診斷任務,從實驗中可以觀察到我們提出的方法之優點。


Mispronunciation detection and diagnosis are part and parcel of a computer assisted pronunciation training (CAPT) system, collectively facilitating second-language (L2) learners to pinpoint erroneous pronunciations in a given utterance so as to improve their spoken proficiency. This thesis presents a continuation of such a general line of research and the major contributions are three-fold. First, we compared the performance of different pronunciation features in mispronunciation detection. Second, we propose an effective training approach that estimates the deep neural network based acoustic models involved in the mispronunciation detection process by optimizing an objective directly linked to the ultimate evaluation metric. Third, we can linearly combine two F1-score when we consider F1-score as final objective function. It can effectively deal with the label imbalance problem. A series of experiments on a Mandarin mispronunciation detection task seem to show the performance merits of the proposed methods.


Bergstra, J.,Breuleux, O.,Bastien, F.,Lamblin, P.,Pascanu, R.,Desjardins, G.,Turian, J.,Warde-Farley, D.,Bengio, Y.(2010).Theano: A CPU and GPU math compiler in Python.Proceedings of the Python for Scientific Computing Conference (SciPy).(Proceedings of the Python for Scientific Computing Conference (SciPy)).
Chen, L. Y.,Jang, J. S. R.(2015).Automatic pronunciation scoring with score combination by learning to rank and class-normalized DP-based quantization.IEEE Transactions on Audio, Speech, and Language Processing.23(11),1737-1749.
Dembczynski, K. J.,Waegeman, W.,Cheng, W.,Hüllermeier, E.(2011).An exact algorithm for F-measure maximization.Advances in Neural Information Processing Systems.1404-1412.
Fujino, A.,Isozaki, H.,Suzuki, J.(2008).Multi-label Text Categorization with Model Combination based on F1-score Maximization.Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP).(Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP)).
Harrison, A. M.,Lau, W. Y.,Meng, H. M.,Wang, L.(2008).Improving mispronunciation detection and diagnosis of learners' speech with context-sensitive phonological rules based on language transfer.Proceedings of the International Conference on Speech Communication and Technology (INTERSPEECH).(Proceedings of the International Conference on Speech Communication and Technology (INTERSPEECH)).
