基於學習排序與類別標準化動態規劃量化法之自動發音評分

本博士論文主要在描述我們所提出的一個基於學習排序與類別標準化動態規劃量化法的自動發音評分的架構。本研究的目的是要能夠訓練出一個幫助第二語言學習者做自動發音評分的模型，使得所得到的分數能夠越接近人類老師所評的分數越好。在此架構下，每個學習者所唸的句子都會由人類老師給序１～５分的評分，而這個分數將被視為模型訓練的學習目標。本研究所使用的語料是由台灣的英語老師所評分。在一開始每個發音的句子會先使用九種音素層級的評分方式來評分，然後使用四種轉換法來轉換為單字層級的評分。我們一共選擇了十六種效果較佳的單字層級的評分來當作學習排多演算法的輸入特徵值，而演算法的輸出再使用我們所提出的量化法來得到離散的１～５分評分。此處所用的量化法是採取類別標準化的動態規劃量化法，可以大幅減輕資料上不同類別間的數量不平衡所衍生的問題。實驗結果證實，我們所提出的評分架構比起前人所提之方法，確實可以達到與人類評分更高的相關係數，以及更高的錯誤發音偵測的精準度。而最後我們也公開了我們在本研究所使用的評分語料庫。

關鍵字

自動發音評分；電腦輔助語言學習；電腦輔助發音訓練；學習排序

並列摘要

This thesis describes an automatic pronunciation scoring framework using learning to rank and class-normalized, dynamic-programming-based quantization. The goal is to train a model that is able to grade the pronunciation of a second language learner, such that the predicted score is as close as possible to the one given by a human teacher. Under this framework, each utterance is given a score of 1 to 5 by human raters, which is treated as a ground truth rank for the training algorithm. The corpus was rated by qualified English teachers in Taiwan (nonnative speakers). Nine phone-level scores are computed and converted into word-level scores through four conversion methods. We select the 16 best performing scores as the input features to train the learning-to-rank function. The output of the function is then quantized to a discrete rank on a 1-5 scale. The quantization is done with class normalization to alleviate the problem of data imbalance over different classes. Experimental results show that the proposed framework achieves a higher correlation to the human scores than other methods, along with higher accuracy in detecting instances of mispronunciation. We also release a new version of our nonnative corpus with human rankings.

並列關鍵字

automatic pronunciation scoring ； computer assisted language learning ； computer assisted pronunciation training ； learning to rank

參考文獻

[36] W. W. Cohen, R. E. Schapire, and Y. Singer, “Learning to order things,” J. Artificial Intelligence Research, vol. 10, pp. 243-270, May 1999.

[16] L. Y. Chen and J. S. R. Jang, “Improvement in automatic pronunciation scoring using additional basic scores and learning to rank,” in Proc. INTERSPEECH 2012, Portland, Oregon, Sep. 2012.

[3] H. Wang, C. J. Waple, and T. Kawahara, “Computer assisted language learning system based on dynamic question generation and error prediction for automatic speech recognition,” Speech Communication, vol. 51, pp.995-1005, Oct. 2009.

[4] O. Ronen, L. Neumeyer, and H. Franco, “Automatic detection of mispronunciation for language instruction,” in Proc. 5th European Conf. on Speech Communication and Technology (Eurospeech ’97), Rhodes, Sep. 1997, pp. 645-648.

[5] S. M. Witt and S. J. Young, “Phone-level pronunciation scoring and assessment for interactive language learning,” Speech Communication, vol. 30, no. 2-3, pp. 95-108, Feb. 2000.

國際替代計量

基於學習排序與類別標準化動態規劃量化法之自動發音評分

主題瀏覽