透過您的圖書館登入
IP:18.216.32.116
  • 學位論文

以最小錯誤鑑別式為基礎之二維倒頻譜語音辨識研究

Research of MCE-Based Two-Dimension Cepstrum Speech recognition

指導教授 : 吳俊德

摘要


此篇論文主要探討以最小錯誤鑑別式研究(Minimum Classification Error, MCE)訓練模型與其他訓練模型的方式比較,並以不同之強健方法提升語音辨識系統中之辨識率。在此研究中,我們對於語料所採用的方式是直接對語料求取改良二維倒頻譜(Two Dimension Cepstrum, TDC )和基因遺傳演算法(Genetic Algorithms, GA),作為語音分析的特徵參數。 在語音系統,訓練時和應用在環境雜訊時不協調,所以有辨識率嚴重的減少。本篇論文將引用最小錯誤鑑別式 (Minimum Classification Error, MCE)強健語料之特徵參數,再利用高斯混合模型(Gaussian Mixture Model, GMM)等不同方法建立語音模型。接著我們利用此系統辨識語音,分別由10人(5男、5女)提供共11000個語音檔,每位語者唸中文數字(0-9)10次,每人選用1040個音檔資訊作為參考音檔,其餘則作為測試音檔。在快速變動之背景噪音情況下測試,於不同強健、建模型之模式中可得其辨識率,最後再加以比較、討論。

並列摘要


The thesis is investigated into training models of Minimum Classification Error (MCE) to compare with other ways, and used different methods of enhancement to improve the performance in the speech recognition system. In the study, we used Modified Two Dimension Cepstrum (MTDC) and Genetic Algorithm to convert the speech data as the features of speech recognition. There is a mismatch between the acoustic conditions of training and applications environment for a speech recognition system, so the performance of the system is seriously degraded. So in this thesis will employ Minimum Classification Error (MCE) based Two Dimension Cepstrum (TDC) to enhance speaker features, then using Gaussian Mixture Model (GMM) to set up speech models. Next, we used the system to identify the speech. We adopted numbers in Chinese (0-9) from 10 speakers (5 males and 5 females), then everyone chanted 10 times for each number (total files: 11000). We selected 1040 files of each one as the training file, the remainder as the testing files. Finally, we compared and discussed the results which are tested in several variable background noises form different conditions.

參考文獻


Bibliography
[1] Chin-Teng Lin, Hsi-Wen Nein, and Jiing-Yuan Hwu, “GA-Based Noisy Speech Recognition Using Two-Dimensional Cepstrum,” IEEE Transaction on Speech and Audio Processing, vol. 8, No. 6, November 2000.
[2] H. Hermansky, N. Morgan, “RASTA processing of speech,” IEEE Transaction Speech Audio Processing”, vol. 2, pp. 578–589, Oct. 1994.
[3] Qiang Fu, Biing-Hwang Juang, Jian-Lai Zhou, and Frank K. Soong, “Generalization of the minimum classification error (MCE) training based on maximizing generalized posterior probability (GPP),” in Proceedings of the International Conference on Speech and Language Processing, (Pittsburgh, PA), Sep. 2006
[4] Biing-Hwang Juang, Wu Hou, and Chin-Hui Lee “Minimum classification error rate methods for speech recognition,” IEEE Transactions on Speech and Audio Processing, vol. 5, pp. 257-265, May 1997.

延伸閱讀