中英混合語音辨識的研究與實作

本論文主要的研究目的為透過訓練中英混合的語音辨識，解決日常生活中常見的中英夾雜對話之辨識問題，研究中的應用情境為日常的中英夾雜對話。本篇論文利用了傳統的GMM-HMM方法以及深層神經網路混合模型DNN-HMM的方法，用以訓練聲學模型。透過處理各種不同的文本（例如：PTT、MATBN以及WSJ），以SRILM的方法訓練語言模型。實驗中的測試語料則是採用 EAT 所切分的測試資料以及國立臺灣大學米爾實驗室(MIRLAB)所錄製的中英混合句子作為測試。本篇論文在嘗試 TCC300 與 WSJ 之搭配以及 MATBN 與 WSJ 之搭配，爾後採取不同標音方式探討其結果，再者則是加入台灣英語語料庫 (English across Taiwan, EAT)及部分的文字後處理，最後得到30.27%的詞錯誤率，相較於未加入 EAT 的詞錯誤率改良了 32.16 %。

關鍵字

多語言混合辨識；時延神經網路；大詞彙語音辨識

並列摘要

The main purpose of this thesis is to solve the common Chinese-English mixed recognition problem in daily conversation by constructing a recogni- tion engine that can deal with such mixed code conversation. We use both the traditional GMM-HMM model and the deep neural network hybrid model(DNN- HMM) as acoustic models for both Chinese and English. We also use various source of texts, including PTT, MATBN, and WSJ to train the language model via the SRILM method. The test copora in the experiments include MIR Chi- nese/English mixed test dataset and and EAT test data. First, we tried the mix of TCC300 and WSJ, and the mix of MATBN and WSJ, for construct- ing acoustic models and compared their performance. Second, we used two different phonetic alphabets to compare their recognition results. Finally, we found the best performance can be achieved by using the mix of MATBN, WSJ, and English across Taiwan corpus (EAT), with a post-processing, to achieve 30.27% word error rate, which is about 32.16% of error reduction when compared with the result without EAT.

並列關鍵字

code-switching recognition ； time-delay neural networks ； LVCSR

參考文獻

[1] P. Auer, Code-switching in conversation: Language, interaction and identity. Rout- ledge, 2013.

Google Scholar

[2] D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hanne- mann, P. Motlicek, Y. Qian, P. Schwarz, J. Silovsky, G. Stemmer, and K. Vesely, “The Kaldi Speech Recognition Toolkit,” in Proc. IEEE ASRU, 2011.