應用於語音辨認之隱藏式條件隨機域聲學模型研究

本文討論使用隱藏式條件隨機域(Hidden Conditional Random Field, 簡稱HCRF)於語音辨識之聲學模型，與傳統之隱藏式馬可夫模型(Hidden Markov Model, 簡稱HMM)進行分析比較，並提出一個結合鑑別式法則之新穎HCRF模型訓練方法。經由TEST500語音資料庫進行連續音節辨識的實驗結果，發現HCRF有較佳的辨識率，其辨認反應時間遠快於HMM，更適合運用於即時辨識。此外，對於HCRF模型訓練方面，比較鑑別式法則與傳統的最大相似度法則，發現採用鑑別式訓練法則的HCRF模型較具有鑑別力。我們利用鑑別式法則訓練HMM至收斂，將其參數轉換成HCRF初始參數，並繼續使用鑑別式法則訓練HCRF模型，得到最佳的HCRF聲學模型，其效能相較於最大相似度法則訓練出的HMM，提高了 10.7%相對音節正確率。本文同時探討在定點化的特徵參數與聲學模型情況下，HCRF與HMM相比，HCRF不論是反應時間與音節正確率皆優於HMM，並在人名辨識的實驗中，搭配光束搜尋法，也得到不錯的效果。

關鍵字

語音辨認；隱藏式條件隨機域；連續音節辨識

並列摘要

In this thesis, we adopt an acoustic modeling with Hidden Conditional Random Field (HCRF)-based approach for speech recognition; and its performance is compared with the traditional Hidden Markov Model (HMM) in the same structure. A novel HCRF training algorithm combining the discriminative training criterion is proposed. In comparison with the performance of the continuous Mandarin syllable recognition in TEST500 database, the HCRF-based approach is better than the one obtained with HMM in the accuracy rate and response time. Proved by a serial of related experiments, we think HCRF is more suitable for real-time speech recognition system. Next, we compare two methods for training HCRF. One is based on maximum likelihood criterion; the other is based on discriminative criterion. These results indicate that the discriminative approach outperforms the training scheme in maximum likelihood criterion. Finally, we investigate our HCRF-based system in fixed-point and limited beam-size issues. The related experimental results show again the advantages of the HCRF-based approach in this thesis.

並列關鍵字

HCRF ； ASR

參考文獻

[1] P. C. Woodland and D. Povey, “Large scale discriminative training of hidden Markov models for speech recognition,” CSL 2002, vol. 16, 25–47, 2002

[2] B.-H. Jaung and S. Katagiri, “Discriminative learning for minimum error classification,” IEEE Transactions of Signal Processing, 1992, vol. 40, issue 12, 3043-3054.

[3] D. Povey and P. C. Woodland, “Minimum phone error and I-smoothing for improved discriminative training,” ICASSP 2002, vol. 1, 105-108, 2002.

[4] D. Povey, Discriminative Training for Large Vocabulary Speech Recognition, Ph.D. thesis, Cambridge University, 2003.

[5] H-K. Kuo and Y. Gao, “Maximum entropy direct models for speech recognition,” ASRU 2003, 1-6, 2003.

被引用紀錄

曾家宏（2012）。基於隱藏式條件隨機域模型之千人語者辨識研究〔碩士論文，元智大學〕。華藝線上圖書館。https://doi.org/10.6838/YZU.2012.00244

許順翔（2009）。基於隱藏式條件隨機域聲學模型之資源受限裝置語音命令系統〔碩士論文，元智大學〕。華藝線上圖書館。https://doi.org/10.6838/YZU.2009.00304

李秋芬（2009）。基於隱藏式條件隨機域聲學模型之強健式訓練演算法〔碩士論文，元智大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0009-2807200913023300

邢凱婷（2009）。基於隱藏式條件隨機域語者模型之語者識別演算法〔碩士論文，元智大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0009-2807200914245700

劉維宸（2011）。基於隱藏式條件隨機域模型調適之語者識別演算法〔碩士論文，元智大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0009-2801201414583635

國際替代計量

應用於語音辨認之隱藏式條件隨機域聲學模型研究

主題瀏覽