發音偏誤模式之督導式偵測與非督導式探勘用於電腦輔助語言學習

Pronunciation error patterns (EPs) are patterns of mispronunciation frequently produced by language learners, and are usually different for different pairs of target and native languages. Accurate information of EPs can offer helpful feedbacks to the learners to improve their language skills. However, the major difficulty of EP detection comes from the fact that EPs are intrinsically similar to their corresponding canonical pronunciation, and EPs corresponding to the same canonical pronunciation are also intrinsically similar to each other. As a result, distinguishing EPs from their corresponding canonical pronunciation and between different EPs of the same phoneme is a difficult task – perhaps even more difficult than distinguishing between different phonemes in one language. On the other hand, the cost of deriving all EPs for each pair of target and native languages is high, usually requiring extensive expert knowledge or high-quality annotated data. Unsupervised EP discovery from a corpus of learner recordings would thus be an attractive addition to the field. In this dissertation, we propose new frameworks for both supervised EP detection and unsupervised EP discovery. For supervised EP detection, we use hierarchical MLPs as the EP classifiers to be integrated with the baseline using HMM/GMM in a two-pass Viterbi decoding architecture. Experimental results show that the new framework enhances the power of EP diagnosis. For unsupervised EP discovery we propose the first known framework, using the hierarchical agglomerative clustering (HAC) algorithm to explore sub-segmental variation within phoneme segments and produce fixed-length segment-level feature vectors in order to distinguish different EPs. We tested K-means (assuming a known number of EPs) and the Gaussian mixture model with the minimum description length principle (estimating an unknown number of EPs) for EP discovery. Preliminary experiments offered very encouraging results, although there is still a long way to go to approach the performance of human experts. We also propose to use the universal phoneme posteriorgram (UPP), derived from an MLP trained on corpora of mixed languages, as frame-level features in both supervised detection and unsupervised discovery of EPs. Experimental results show that using UPP not only achieves the best performance , but also is useful in analyzing the mispronunciation produced by language learners.

並列關鍵字

Computer-Assisted Language Learning ； Computer-Aided Pronunciation Training ； Error Pattern Detection ； Error Pattern Discovery ； Universal Phoneme Posteriorgram

參考文獻

[1] X. Qian, F. Soong, and H. Meng, “Discriminative acoustic model for improving mispronunciation detection and diagnosis in computer-aided pronunciation training (CAPT),” in Proc. INTERSPEECH 2010.

[3] P. Wik and A. Hjalmarsson, “Embodied conversational agents in computer assisted language learning,” Speech communication, vol. 51, no. 10, pp. 1024–1037, 2009.

[6] A. Alwan, Y. Bai, M. Black, L. Casey, M. Gerosa, M. Heritage, M. Iseli, B. Jones, A. Kazemzadeh, S. Lee et al., “A system for technology based assessment of language and literacy in young children: the role of multiple information sources,” in Multimedia Signal Processing, 2007. MMSP 2007. IEEE 9th Workshop on. IEEE, 2007, pp. 26–30.

[8] K. Zechner, D. Higgins, X. Xi, and D. Williamson, “Automatic scoring of non-native spontaneous speech in tests of spoken English,” Speech Communication, vol. 51, no. 10, pp. 883–895, 2009.

[9] B. Yoshimoto, “Rainbow rummy: a web-based game for vocabulary acquisition using computer-directed speech,” Ph.D. dissertation, Massachusetts Institute of Technology, 2009.

被引用紀錄

蘇嘉雄（2014）。電腦輔助華語學習之聲調偏誤類型偵測〔碩士論文，國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU.2014.02460

國際替代計量

發音偏誤模式之督導式偵測與非督導式探勘用於電腦輔助語言學習

全文下載

主題瀏覽