簡易檢索 / 詳目顯示

研究生: 賴子婷
Lai, Tzu-Ting
論文名稱: 英文初學者發音自動評分之研究
The Research of Automatic Pronunciation Evaluation for Beginners
指導教授: 李忠謀
Lee, Chung-Mou
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2015
畢業學年度: 103
語文別: 中文
論文頁數: 44
中文關鍵詞: 語音辨識語言學習字串相似度發音評估
英文關鍵詞: Speech Recognition, Language learning, String matching, Pronunciation evaluation
DOI URL: https://doi.org/10.6345/NTNU202203532
論文種類: 學術論文
相關次數: 點閱:104下載:61
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 電腦輔助發音訓練(Computer Assisted Pronunciation Training,CAPT)是常用的一種語言學習方式,可以針對初學者的英文發音提供回饋讓初學者可以反覆的練習。本研究利用語音辨識以及字串相似度比對的技術,建置一個適合初學者英文發音的辨識模型用以輔助初學者發音練習。
    本研究包含兩部分,第一部分為建置語音辨識模型,使用本研究自行錄製的JTES語料庫建置初始模型,再挑選JTJS中較優初學者的語音進行模型調適,作為整體的語音辨識模型;第二部分為評估是採用字串比對方式藉由本研究所提出的Levenshtein Distance-Like作為相似度計算且藉由cubic polynomial fit找到四個等級(好、尚可、待加強、重錄)的門檻值。
    實驗結果呈現,當分成四個等級時人工評分與系統評分的正確率為75%,代表系統有一定的準確率,透過皮爾森係數得知人工評分與系統評分的相關性為0.71,呈現人工評分與系統評分是具有相關的,因此系統給予的回饋對於初學者是有一定的可信度,可以藉由此來提升口說技能。

    “Computer Assisted Pronunciation Training “program is primary designed to assist students in language learning. The program provides the feedback based on each individual need and it helps beginners to repeat practice proper pronunciation. The research utilizes the speech recognition and string matching to build speech recognize model for beginners to practice pronunciation.
    Research consisted two main parts. First part is to build speech recognize model, which is to record JTES corpus. The next step is to select the top speeches in JTJS corpus to do model adaption. The second part is to evaluate speeches by using string-matching method. We proposal Levenshtein Distance-Like approach and using cubic polynomial fit to find threshold. Those approaches help us to separate into four levels of the evaluating standards (excellent, average, inferior, and re-recording).
    The result from the experiment shows the accuracy of evaluating process is around 75% when the program is separated into four levels. This is supported by both human and systematic evaluation. Based on the analysis of Pearson correlation, correlation between human and system evaluation is 0.71, which mean two variables are correlated. Therefore, the system is credible for beginners to learn and enhance their verbal skill.

    目錄 摘要 I ABSTRACT II 致謝 III 圖目錄 VII 表目錄 VIII 第一章 前言 1 1.1 研究動機 1 1.2 研究目的 2 1.3 研究範圍與限制 2 1.4 論文架構 3 第二章 文獻探討 4 2.1 電腦輔助發音訓練(COMPUTER ASSISTED PRONUNCIATION TRAINING, CAPT) 4 2.2 語音辨識 5 2.2.1 語音特徵擷取 6 2.2.2 聲學模型 8 2.2.3 語言模型 9 2.3 語音評分 10 2.3.1 標準語音評分 10 2.3.2 語音模型評分 10 2.4 語音辨識工具 11 2.4.1 Google語音搜尋 11 2.4.2 賽維 12 2.4.3 The Microsoft Speech API (SAPI) 13 2.4.4 Hidden Markov Model Toolkit (HTK) 14 第三章 研究方法 16 3.1 系統架構 16 3.2 語音辨識模型訓練 17 3.2.1 語音特徵擷取 17 3.2.2 語音辨識模型建置 18 3.3 系統評分 20 3.3.1 最常共同子序列 (Longest Common Subsequence, LCS) 20 3.3.2漢明距離 (Hamming Distance) 21 3.3.3 編輯距離 (Levenshtein Distance, LD) 22 3.3.4 LD-Like (Levenshtein Distance - Like) 23 3.3.5 字串比對方法比較 25 第四章 實驗結果 26 4.1實驗語料 26 4.1.1 JTES (Junior Textbook by English Student) 26 4.1.2 JTJS (Junior Textbook by Junior Student) 26 4.2 實驗設計 27 4.2.1 語料庫前處理 27 4.2.2 實驗所使用的語料庫數量 27 4.3語音模型基礎實驗 28 4.4 國中生語音評量驗證實驗 28 4.4.1 相似度分數轉等級實驗 29 4.4.2 評量的權重比例分析 31 4.4.3 系統與人工相關係數實驗 34 第五章 結論與未來展望 37 附錄 A IOS實驗 38 參考文獻 43

    [1] Murray, D. E., "A Case for Online English Language Teacher Education," The International Research Foundation for English Language Education 2013
    [2] Coniam D., "Voice Recognition Software Accuracy with Second Language Speakers of English," System 27 1999, p49-64
    [3] Nguyen, H., et. al., “Automatic Speech Recognition for Vietnamese Using HTK System”, International Conference on Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), Hanoi, November 2010
    [4] ISLE, "Interactive Spoken Language Education", University of Hamburg. [Online:http://nats-www.informatik.uni-hamburg.de/~isle/]
    [5] Franco H., Abrash V., Precoda K., Bratt H., Rao R., Butzberger J., "The SRI EduSpeak System: Recognition and Pronunciation Scoring for Language Learning", Proceedings of INSTIL 2000, p123-128.
    [6] Mak, B.S., Ng, M., Tam, Y.-C., Chan, Y.-C., Chan, K.-W., Leung, K.Y., Ho, S., Chong, F.H., Wong, J., Lo, J., "PLASER: Pronunciation Learning via Automatic Speech Recognition,", Proceedings of HLT-NAACL 2003, p23-29
    [7] 羅瑞麟,"以語音辨識與評分輔助口說英文學習",國立清華大學碩士論文,2004年
    [8] Mel-scale Frequency Cepstral [Online:http://en.wikipedia.org/wiki/Mel-frequency_cepstrum]
    [9] Katz, S. M., "Estimation of probabilities from sparse data for the language model component of a speech recognizer," Proceedings of IEEE Transactions on Acoustics, Speech and Signal Processing 1987, p400-401
    [10] Kneser, R., Ney, H., "Improved backing-off for m-gram language modeling," Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing 1995, p181-184
    [11] Chen, S. F., Goodman, J., "An empirical study of smoothing techniques for language modeling," Proceedings of the 34th annual meeting on Association for Computational Linguistics 1996, p310-318
    [12] 李俊毅,"語音評分",國立清華大學碩士論文,2002年
    [13] Witt, S.M., Young, S., "Phone-level Pronunciation Scoring and Assessment for Interactive Language Learning," Speech Communication 2000, 95-108
    [14] Chen, L., Zechner, K., "Applying rhythm features to automatically assess non-native speech," Proceedings of Interspeech 2011, p1861-1864
    [15] Google Voice Search [Online:http://www.google.com/intl/zh-TW/insidesearch/features/voicesearch/index-chrome.html]
    [16] VoiceGo [Online:http://www.cyberon.com.tw/traditional/index.html.php]
    [17] Windows SAPI [Online:https://msdn.microsoft.com/zh-CN/library/ms862685.aspx]
    [18] HTKBook[Online:Online:http:// htk.eng.cam.ac.uk/docs/docs.shtml]
    [19] HTK Introduction [Online:http://mirlab.org/jang/books/audiosignalprocessing/htkIntro_chinese.asp?title=16-1%20HTK%20Introduction%20(HTK%20%C2%B2%A4%B6]
    [20] Evanini K., Wang X., "Automated speech scoring for non-native middle school students with multiple task types," Proceedings of Interspeech 2013, p2435-243
    [21] Allison L., Dix, T.I., "A bit-string-longest-common-subsequence algorithm. Information Processing Letters," Information Processing Letters 1986, p305-310
    [22] Apostolico, A., Guerra, C., Pizzi, C., "Alignment Free Sequence Similarity with Bounded Hamming Distance," Proceedings of 2014 Data Compression Conference, Compression Conference 2014, p183-192
    [23] TEIN, V I., "Binary codes capable of correcting deletions, insertions, and reversals," Soviet Physics Doklady 1966, p707–710
    [24] Chowdhury, S.D., Bhattacharya, U., Parui, S.K, "Online handwriting recognition using Levenshtein distance metric," Proceedings of the 12th International Conference on Document Analysis and Recognition 2013, p79-83

    下載圖示
    QR CODE