以最小錯誤鑑別式為基礎之二維倒頻譜語音辨識研究

此篇論文主要探討以最小錯誤鑑別式研究(Minimum Classification Error, MCE)訓練模型與其他訓練模型的方式比較，並以不同之強健方法提升語音辨識系統中之辨識率。在此研究中，我們對於語料所採用的方式是直接對語料求取改良二維倒頻譜(Two Dimension Cepstrum, TDC )和基因遺傳演算法(Genetic Algorithms, GA)，作為語音分析的特徵參數。在語音系統，訓練時和應用在環境雜訊時不協調，所以有辨識率嚴重的減少。本篇論文將引用最小錯誤鑑別式 (Minimum Classification Error, MCE)強健語料之特徵參數，再利用高斯混合模型(Gaussian Mixture Model, GMM)等不同方法建立語音模型。接著我們利用此系統辨識語音，分別由10人(5男、5女)提供共11000個語音檔，每位語者唸中文數字(0-9)10次，每人選用1040個音檔資訊作為參考音檔，其餘則作為測試音檔。在快速變動之背景噪音情況下測試，於不同強健、建模型之模式中可得其辨識率，最後再加以比較、討論。

關鍵字

最小錯誤鑑別式；語音辨識；二維倒頻譜；基因遺傳演算法；高斯混合模型

並列摘要

The thesis is investigated into training models of Minimum Classification Error (MCE) to compare with other ways, and used different methods of enhancement to improve the performance in the speech recognition system. In the study, we used Modified Two Dimension Cepstrum (MTDC) and Genetic Algorithm to convert the speech data as the features of speech recognition. There is a mismatch between the acoustic conditions of training and applications environment for a speech recognition system, so the performance of the system is seriously degraded. So in this thesis will employ Minimum Classification Error (MCE) based Two Dimension Cepstrum (TDC) to enhance speaker features, then using Gaussian Mixture Model (GMM) to set up speech models. Next, we used the system to identify the speech. We adopted numbers in Chinese (0-9) from 10 speakers (5 males and 5 females), then everyone chanted 10 times for each number (total files: 11000). We selected 1040 files of each one as the training file, the remainder as the testing files. Finally, we compared and discussed the results which are tested in several variable background noises form different conditions.

並列關鍵字

Minimum Classification Error (MCE) ； speech recognition ； Two Dimension Cepstrum (TDC) ； Genetic Algorithm (GA) ； Gaussian Mixture Models (GMM)

參考文獻

Bibliography

Google Scholar

[1] Chin-Teng Lin, Hsi-Wen Nein, and Jiing-Yuan Hwu, “GA-Based Noisy Speech Recognition Using Two-Dimensional Cepstrum,” IEEE Transaction on Speech and Audio Processing, vol. 8, No. 6, November 2000.

Google Scholar

[2] H. Hermansky, N. Morgan, “RASTA processing of speech,” IEEE Transaction Speech Audio Processing”, vol. 2, pp. 578–589, Oct. 1994.

Google Scholar

[3] Qiang Fu, Biing-Hwang Juang, Jian-Lai Zhou, and Frank K. Soong, “Generalization of the minimum classification error (MCE) training based on maximizing generalized posterior probability (GPP),” in Proceedings of the International Conference on Speech and Language Processing, (Pittsburgh, PA), Sep. 2006

Google Scholar

[4] Biing-Hwang Juang, Wu Hou, and Chin-Hui Lee “Minimum classification error rate methods for speech recognition,” IEEE Transactions on Speech and Audio Processing, vol. 5, pp. 257-265, May 1997.

Google Scholar

國際替代計量

以最小錯誤鑑別式為基礎之二維倒頻譜語音辨識研究

全文下載

主題瀏覽