透過您的圖書館登入
IP:18.117.152.251
  • 學位論文

基於學習排序與類別標準化動態規劃量化法之自動發音評分

Automatic Pronunciation Scoring with Score Combination by Learning to Rank and Class-Normalized DP-based Quantization

指導教授 : 張智星
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


本博士論文主要在描述我們所提出的一個基於學習排序與類別標準化動態規劃量化法的自動發音評分的架構。本研究的目的是要能夠訓練出一個幫助第二語言學習者做自動發音評分的模型,使得所得到的分數能夠越接近人類老師所評的分數越好。在此架構下,每個學習者所唸的句子都會由人類老師給序1~5分的評分,而這個分數將被視為模型訓練的學習目標。本研究所使用的語料是由台灣的英語老師所評分。在一開始每個發音的句子會先使用九種音素層級的評分方式來評分,然後使用四種轉換法來轉換為單字層級的評分。我們一共選擇了十六種效果較佳的單字層級的評分來當作學習排多演算法的輸入特徵值,而演算法的輸出再使用我們所提出的量化法來得到離散的1~5分評分。此處所用的量化法是採取類別標準化的動態規劃量化法,可以大幅減輕資料上不同類別間的數量不平衡所衍生的問題。實驗結果證實,我們所提出的評分架構比起前人所提之方法,確實可以達到與人類評分更高的相關係數,以及更高的錯誤發音偵測的精準度。而最後我們也公開了我們在本研究所使用的評分語料庫。

並列摘要


This thesis describes an automatic pronunciation scoring framework using learning to rank and class-normalized, dynamic-programming-based quantization. The goal is to train a model that is able to grade the pronunciation of a second language learner, such that the predicted score is as close as possible to the one given by a human teacher. Under this framework, each utterance is given a score of 1 to 5 by human raters, which is treated as a ground truth rank for the training algorithm. The corpus was rated by qualified English teachers in Taiwan (nonnative speakers). Nine phone-level scores are computed and converted into word-level scores through four conversion methods. We select the 16 best performing scores as the input features to train the learning-to-rank function. The output of the function is then quantized to a discrete rank on a 1-5 scale. The quantization is done with class normalization to alleviate the problem of data imbalance over different classes. Experimental results show that the proposed framework achieves a higher correlation to the human scores than other methods, along with higher accuracy in detecting instances of mispronunciation. We also release a new version of our nonnative corpus with human rankings.

參考文獻


[3] H. Wang, C. J. Waple, and T. Kawahara, “Computer assisted language learning system based on dynamic question generation and error prediction for automatic speech recognition,” Speech Communication, vol. 51, pp.995-1005, Oct. 2009.
[4] O. Ronen, L. Neumeyer, and H. Franco, “Automatic detection of mispronunciation for language instruction,” in Proc. 5th European Conf. on Speech Communication and Technology (Eurospeech ’97), Rhodes, Sep. 1997, pp. 645-648.
[5] S. M. Witt and S. J. Young, “Phone-level pronunciation scoring and assessment for interactive language learning,” Speech Communication, vol. 30, no. 2-3, pp. 95-108, Feb. 2000.
[8] J. Tepperman and S. Narayanan, “Automatic syllable stress detection using prosodic features for pronunciation evaluation of language learners,” in Proc. Int. Conf. Acoustic, Speech and Signal Processing, Philadelphia, Pennsylvania, Mar. 2005, pp. 937-940.
[10] M. P. Black, A. Kazemzadeh, J. Tepperman, and S. S. Narayanan, “Automatically Assessing the ABCs: Verification of Children’s Spoken Letter-Names and Letter-Sounds,” ACM Trans. on Speech and Language Processing, vol. 7, no. 4, article 15, Aug. 2011.

延伸閱讀