由於傳統的聲學模型訓練方法採用最大化相似度訓練法則(Maximum Likelihood Estimation, MLE),在訓練時沒有考慮模型與模型之間相互鑑別性的關係,為了提升模型之間的相互鑑別度,鑑別式訓練法則因而被提出。本論文有鑒於近年來最小化音素錯誤訓練法則(Minimum Phone Error, MPE)在許多實驗的效果顯著,於是將之應用於以TIMIT為語料的連續音素辨識系統。在本論文之實驗中,先以最大化相似度訓練法則做聲學模型的訓練,接著採用最小化音素錯誤訓練法則對聲學模型進行再訓練。實驗結果顯示相較於最大化相似度訓練法則,最小化音素錯誤訓練確實能進一步降低音素錯誤率。最小化音素錯誤採用 Phone Lattice來代表所有可能句子的集合,本論文主要採用N-最佳路徑(N-Best List)的方法來建構Phone Lattice(N-Best Synthesized Lattice),針對Phone Lattice中最混淆的部份做最有效率的訓練。另外為了突顯音素混淆的部份以及過濾在相同時間重複出現的音素,本論文採用另一種Phone Lattice:Sausage來實做,藉由此精簡過的Phone Lattice,提升最小音素錯誤訓練的效果。
Maximum Likelihood Estimation (MLE) is a traditional method for training acoustic models for speech recognition. This method does not consider discriminative relation between acoustic models, so some models are apt to obscure each other. In order to raise the differentiation degree between models, discriminative training criteria are proposed. Seeing that Minimum Phone Error (MPE) criterion has great progress reported in the literature, we apply MPE to continuous phone speech recognition system in this thesis. The procedure is to adopt MLE to train acoustic models first, and then use MPE to refine the models again. According to the experimental result, MPE can reduce phone error rate further. In general, MPE adopts phone lattice to express all possible sentences. In order to improve the efficiency, we use N-Best list to construct a phone lattice which is called N-Best Synthesized Lattice. Besides, in order to distinguish obscure phones and remove repeated words that appear in very close time, we use another kind of phone lattice called sausage that can improve the results of MPE.