最小化音素錯誤鑑別式訓練法則應用於華語語者調適之研究

在語音辨識技術的實際應用上，語者調適技術（speaker adaptation）常被用做調整語者無關（speaker independent）之聲學模型，使其對於特定語者之辨識率有所提升，常見的語者調適技術有最大化相似度線性迴歸（Maximum Likelihood Linear Regression），其精神在於透過對語音辨識模型相近的mixture做群聚，再對各個群聚做調整，以期能達到利用少量語料，提升辨識率的目的，但其缺點為，在發音上相近的模型（如ㄓ、ㄗ），其mixture原本就極為近似，若再分類為同一群聚做調整，容易因為使用者的發音習慣，而使模型偏向ㄓ或ㄗ，因此雖然整體的辨識率獲得提升，但卻造成混淆音的錯誤率上升。本論文提出應用近年來提出之最小化音素錯誤鑑別式訓練法則，對於經過語者調適之模型，使用調適語料，做更進一步之最小化音素錯誤訓練，並透過調整I-smoothing參數、降低或是改變I-smoothing中maximum likelihood estimation的權重，改變音素圖結構、以及音素正確率計算方式，以期能降低混淆音之錯誤率，並進一步提升模型之整體辨識率。此外，本論文更進一步結合regression tree的概念，以regression tree中群聚為基礎，調整MPE之I-smoothing權重參數，目標在使調適後之聲學模型對regression tree群聚中的音素有更佳的辨識率。

關鍵字

最小化音素錯誤；語者調適；最大化相似度線性迴歸

並列摘要

In order to decrease the error rate of speech recognition, speaker adaptation techniques are often used to adjust speaker-dependent acoustic models. MLLR (Maximum Likelihood Linear Regression) and MAP (Maximum a Posteriori) are two of the most popular techniques in recent years. MLLR uses the technique of regression trees. It calculates the transform matrix for each leaf node of the tree. This makes it possible to use fewer sentences to decrease the error rate of HMM-based speech recognition. However, while we examined the recognition result, we found that although the overall error rate decreased, but the error rate of certain confusable phones was higher. In order to solve this problem, we propose the use MPE (Minimum Phone Error Discriminative Training) to solve this problem. We use the same corpus as the one in MLLR adaptation, and use MPE to make further adjustment to acoustic models which have been adapted by MLLR. Besides, we tested several methods such as adjusting I-smoothing factors or phone lattices to obtain finer result. Besides, we also introduced a new approach to reduce the computation time of both the lattice construction and the MPE- weight calculation, all based on a better use of n-best recognition (3.3.3). Furthermore, we proposed a new method to combine the statistic result of regression trees and I-smoothing factor based on the observation result of chapter 2.1.3. Experiment results show that it can further reduce the error rate.

並列關鍵字

MPE ； speaker adaptation ； mllr ； map ； regression tree

參考文獻

Jen-Wei Kuo, “An Initial Study on Minimum Phone Error Discriminative Learning of Acoustic Models for Mandarin Large Vocabulary Continuous Speech Recognition”, thesis, 2005.

D. Povey, “Discriminative Training for Large Vocabulary Speech Recognition”, Ph.D. thesis 2004

M.J.F Gales , “Maximum Likelihood Linear Transformations For Hmm Based Speech Recognition”, May 1997 (revised January 1998)

M.J.F Gales and P.C. Woodland , “Mean and Variance Adaptation within the MLLR Framework.” , April 1996

C.J. Leggetter and P.C. Woodland, “Flexible Speaker Adaptation using Maximum Likelihood Linear Regression”. Proc. ARPA Spoken Language Technology Workshop, pp. 104–109, Feb. 1995.

被引用紀錄

侯致遠（1999）。主要日報1998年曼谷亞運報導分析〔博士論文，國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-2603200719102677

黃蕙娟（2009）。運動賽事整合行銷傳播模式建構之研究—以2004年~2008年ING臺北馬拉松為例〔博士論文，國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-1610201315151041

國際替代計量

最小化音素錯誤鑑別式訓練法則應用於華語語者調適之研究

全文下載

主題瀏覽