基於動態規劃之機器學習方法於小字彙DTW語音辨識系統之研究

本論文提出了一種以動態規劃為基礎的機器學習方法於小字彙DTW語音辨識系統上，對於早期的語音辨識技術核心，是以動態規劃(dynamic programming, DP)之原理所延伸出來的動態時間校正(dynamic time warping, DTW)為主，但由於傳統的DTW語音辨識系統在辨識比對的過程中，是屬於一種樣本匹配(template matching)的比對方式，其會因為參考樣本的樣本數量而影響整個辨識系統的辨識比對速度與辨識效果，並且傳統DTW對於小字彙獨立詞的辨識當中，若是獨立詞的字數過多時，也同樣會影響整個辨識系統的辨識準確率，因此為了改善傳統DTW語音辨識系統的部分隱憂，本文首先提出三種機器學習方法於DTW語音辨識，分別是累進式學習與優先權剃除學習方法等兩種監督式學習方法，以及最多數匹配學習之非監督式學習方法。在實驗部分也證明了這樣的機器學習方法研究的確能夠有效提升傳統DTW在語音辨識上的辨識準確率。傳統DTW所採用之樣本匹配比對方式為一種非模型化(modeling)的方法，其在辨識比對及語音辨識系統的學習上仍將有諸多弱點極需克服，為了解決此問題而能有效強化傳統DTW語音辨識技術，本論文延續前述研究，接續提出一種仿隱藏式馬可夫模型(hidden Markov model, HMM)方法於小字彙DTW語音辨識之研究。隱藏式馬可夫模型是一種具狀態轉移觀念與統計理論的機率模型，本論文藉由這類具備模型化概念之HMM技術，將其設計為一套簡易版本(亦即仿隱藏式馬可夫模型)而植入至傳統DTW辨識技術中。在DTW技術中所發展之簡易化HMM方法稱為SHMM(亦即Simplified HMM)，在SHMM的設計架構下，傳統HMM辨識時所慣用之Viterbi演算法將能同時融合DTW動態規劃技術而成為一種改良式Viterbi演算法，此一改良式Viterbi演算法將能有效提昇樣本辨識的性能。所發展之系統在進行辨識決策時，分別先取得改良式Viterbi演算法之計算結果與DTW動態規劃比對方法之運算結果，而再藉由設計一個具模糊邏輯推論的模糊控制器將此兩項演算結果之值進行決策融合而最後得到一個精準之辨識輸出結果值，此方法稱為FuzzySHMMDTW。實驗結果顯示，在對於小字彙獨立詞的辨認情況當中，FuzzySHMMDTW之辨識準確率是比傳統DTW語音辨識具有更高的準確性。　　針對小字彙DTW語音辨識所設計出的SHMM建模方法，使得辨識系統當中已具有狀態統計觀念的模型，其DTW辨識時之比對運算不再只是單純的樣本匹配方式。為了使這類所發展之具模型化的DTW辨識技術能夠依照不同語者的發音性質而進一步進行學習，進而使系統的平均辨識率能維持一定的水準，本論文設計了兩種模型學習方法，一種為基於學習語料之數量做為主要考量的方法，而另一種則是以學習語料之品質作為主要訴求的方式，實驗結果顯示了對於辨識效果較差之語者，在經過系統模型的學習後，能夠使其辨識準確率得到有效地提升，進而達到一定水平。

關鍵字

語音辨識；動態時間校正；機器學習；改良式Viterbi ； SHMM

並列摘要

This thesis presents a new framework of machine learning based on dynamic programming for small-sized vocabulary DTW speech recognition. Two categories of learning strategies for DTW are developed first, which are supervised learning and unsupervised learning. Supervised learning contains incremental learning and priority rejection learning methods. For unsupervised learning, an approach called most matching learning is developed. All these three machine learning methods are effective for DTW on recognition performance improvements, which can be proved by experiments. In addition, we further present a hidden Markov model (HMM)-like approach for DTW speech recognition, which is called as simplified HMM (SHMM). SHMM is a simple-versioned HMM modeling technique for conventional DTW. Under the framework of SHMM, an improved Viterbi algorithm, called iViterbi, is proposed. iViterbi combines the dynamical programming of DTW and optimal calculations of conventional Viterbi for pattern recognition. At last, we design a fuzzy controller for the recognition system when making a decision of recognition results. The fuzzy scheme will carry out model fusion that combines DTW and iViterbi recognition calculation outcomes efficiently. The overall recognition system with the support of fuzzy control is therefore called FuzzySHMMDTW. Experimental results on small-sized vocabulary speech recognition show that the recognition rate of proposed FuzzySHMMDTW is better than that of traditional DTW. In order to maintain the recognition performance of FuzzySHMMDTW on a standard level even when the system encounters a strange speaker, we proposed two modeling-based machine learning methods for FuzzySHMMDTW. Experimental results demonstrate the effectiveness of these two learning methods. By learning, the recognition performance of the system will be improved continually.

並列關鍵字

speech recognition ； dynamic time warping ； machine learning ； iViterbi ； SHMM

參考文獻

21. 林子正, “基於多模型架構之語者辨認系統”, 國立虎尾科技大學電機工程系碩士班碩士論文, 2012.

1. P. Woodland, “Speech recognition,” IEE Colloquium on Speech and. Language Engineering, pp. 1–5, 1998.

4. T. B. Amin and I. Mahmood, “Speech recognition using dynamic time warping,” Proc. International Conference on Advances in space Technologies (ICAST), Islamabad, Pakistan, November 2008.

5. L. R. Rabiner and B. H. Juang, “An introduction to hidden Markov models,” IEEE ASSP Magazine, vol. 3, no. 1, pp.4-16, Jan 1986

6. L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE , vol.77, no.2, pp.257-286, Feb 1989

被引用紀錄

張哲維（2014）。運用3D運動特徵於人體姿態辨識及其學習方法之研究〔碩士論文，國立虎尾科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0028-2807201421475300

吳宗桂（2015）。運用KINECT姿態辨識的使用者辨識研究〔碩士論文，國立虎尾科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0028-3107201501460900

張育瑞（2016）。一種人體姿態命令辨識及其身份識別的強化式方法之研究〔碩士論文，國立虎尾科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0028-1608201616302200

蘇俊麟（2016）。基於影像之深度資訊的手勢辨識方法研究〔碩士論文，國立虎尾科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0028-3007201613560100

林瑞智（2017）。一個運用穿戴式感測裝置的手勢辨識系統設計〔碩士論文，國立虎尾科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0028-2507201713481600

國際替代計量

基於動態規劃之機器學習方法於小字彙DTW語音辨識系統之研究

未授權

主題瀏覽