透過您的圖書館登入
IP:3.139.86.62
  • 學位論文

中英混合語音辨識的研究與實作

Research and Implementation of Chinese-English Mixed Speech Recognition

指導教授 : 張智星
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


本論文主要的研究目的為透過訓練中英混合的語音辨識,解決日常生活中常見的中英夾雜對話之辨識問題,研究中的應用情境為日常的中英夾雜對話。本篇論文利用了傳統的GMM-HMM方法以及深層神經網路混合模型DNN-HMM的方法,用以訓練聲學模型。透過處理各種不同的文本(例如:PTT、MATBN以及WSJ),以SRILM的方法訓練語言模型。實驗中的測試語料則是採用 EAT 所切分的測試資料以及國立臺灣大學米爾實驗室(MIRLAB)所錄製的中英混合句子作為測試。本篇論文在嘗試 TCC300 與 WSJ 之搭配以及 MATBN 與 WSJ 之搭配,爾後採取不同標音方式探討其結果,再者則是加入 台灣英語語料庫 (English across Taiwan, EAT)及部分的文字後處理,最後得到30.27%的詞錯誤率,相較於未加入 EAT 的詞錯誤率改良了 32.16 %。

並列摘要


The main purpose of this thesis is to solve the common Chinese-English mixed recognition problem in daily conversation by constructing a recogni- tion engine that can deal with such mixed code conversation. We use both the traditional GMM-HMM model and the deep neural network hybrid model(DNN- HMM) as acoustic models for both Chinese and English. We also use various source of texts, including PTT, MATBN, and WSJ to train the language model via the SRILM method. The test copora in the experiments include MIR Chi- nese/English mixed test dataset and and EAT test data. First, we tried the mix of TCC300 and WSJ, and the mix of MATBN and WSJ, for construct- ing acoustic models and compared their performance. Second, we used two different phonetic alphabets to compare their recognition results. Finally, we found the best performance can be achieved by using the mix of MATBN, WSJ, and English across Taiwan corpus (EAT), with a post-processing, to achieve 30.27% word error rate, which is about 32.16% of error reduction when compared with the result without EAT.

參考文獻


[1] P. Auer, Code-switching in conversation: Language, interaction and identity. Rout- ledge, 2013.
[2] D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hanne- mann, P. Motlicek, Y. Qian, P. Schwarz, J. Silovsky, G. Stemmer, and K. Vesely, “The Kaldi Speech Recognition Toolkit,” in Proc. IEEE ASRU, 2011.
[3] 吳妙嬬, “Bilingual Code-Mixed Acoustic Modeling by Unit Mapping and Model Recovery,” Master’s thesis, 國立臺灣大學, 2007.
[4] 葉青峰, “Initial Study on Chinese/English Bilingual Speech Recognition based on Lecture Recording,” Master’s thesis, 國立臺灣大學, 2011.
[5] 卓楷斌, “Merging Acoustic Models for Improving Mandarin-English Bilingual Speech Recognition,” Master’s thesis, 國立清華大學, 2012.

延伸閱讀