透過您的圖書館登入
IP:3.145.60.149
  • 學位論文

結合KINECT麥克風陣列之語者定位的語音模樣辨識研究

KINECT Microphone Array-Based Speaker Localization for Speech Pattern Recognition

指導教授 : 丁英智
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


本論文結合KINECT麥克風陣列開發出一套具語者定位的語音模樣辨識系統,並採用由微軟開發之KINECT感測器之麥克風陣列裝置,利用其提供麥克風陣列捕捉語音訊號,並以語者定位為基礎,發展出具有定位效能的語音模樣辨識。 本論文使用KINECT麥克風陣列進行語者定位,語者定位分為兩項,首先是使用微軟開放的軟體開發套件(Software Development Kit, SDK),此定位方法會侷限於限制性的方位角度之偵測範圍,並且此方法不能計算距離資訊,因此我們提出另一項方法,利用訊號到達的時間差(Time Difference of Arrival, TDOA),此定位方法偵測方位角度範圍較廣並能提供距離資訊。 本論文使用KINECT麥克風陣列進行語者定位的語音模樣辨識研究,第一項為KINECT麥克風陣列語者定位的語者確認研究,第二項為KINECT麥克風陣列語者定位的語音辨識研究,為了提升辨識效能,本研究加入Type-1模糊模型(Type-1 Fuzzy Model)與Type-2模糊模型(Type-2 Fuzzy Model)對辨識系統做進一步的辨識性能改良。 在運用KINECT麥克風陣列語者定位的語者確認研究方面,本研究語者確認方法採用支撐向量機(Support Vector Machine, SVM),我們採用決策融合方式進行方法的發展,並提出三種語者確認方法,方法一為導入Type-1模糊系統於KINECT麥克風陣列SDK語者確認,該方法以KINECT SDK計算得到之角度做為驅動模糊系統輸入參數,方法二為導入Type-1模糊系統於KINECT麥克風陣列TDOA語者確認,該方法以KINECT TDOA計算得到之角度與設定距離做為驅動模糊系統輸入參數,方法三為導入Type-2模糊系統於KINECT麥克風陣列TDOA語者確認,該方法以KINECT TDOA計算得到之角度做為驅動模糊系統輸入參數。 在運用KINECT麥克風陣列語者定位的語音辨識研究方面,本研究語音辨識方法採用動態時間校正(Dynamics Time Warping, DTW),我們也採用決策融合方式進行方法的發展,並提出三種語音辨識方法,第一種方法導入Type-1模糊系統於KINECT麥克風陣列SDK語音辨識,該方法的模糊系統輸入參數為KINECT SDK計算之角度,第二種導入Type-1模糊系統於KINECT麥克風陣列TDOA語音辨識,該方法的模糊系統輸入參數為KINECT TDOA計算得到之角度與設定距離,第三種導入Type-2模糊系統於KINECT麥克風陣列TDOA語音辨識,該方法的模糊系統輸入參數為KINECT TDOA計算得到之角度。 本論文更進一步的運用TDOA語者定位同時調整SVM自由參數C與自由參數γ於SVM語者確認應用上,我們提出了將TDOA角度與距離資訊加入Type-1模糊系統來同時決策自由參數C與自由參數γ。 本論文進行傳統單顆麥克風與KINECT SDK提供之融合方式用於SVM語者確認辨識上,傳統單顆麥克風SVM辨識性能52.5%與KINECT SDK融合方式65%,傳統單顆麥克風與KINECT SDK提供之融合方式用於DTW語音辨識上,傳統單顆麥克風DTW辨識性能57.6%與KINECT SDK融合方式79.2%。 本研究所提出將語者定位結合SVM語者確認研究上提出三種研究方法,方法一的語者確認方法Type-1 Fuzzy KINECT麥克風陣列SDK融合的SVM語者確認辨識性能為88.99%,方法二的語者確認方法Type-1 Fuzzy KINECT麥克風陣列TDOA融合的SVM語者確認辨識性能提升到90.99%,方法三的語者確認方法Type-2 Fuzzy KINECT麥克風陣列TDOA融合的SVM語者確認辨識性能上升到93.99%,而在語者定位結合DTW語音辨識研究上,方法一的語音辨識方法Type-1 Fuzzy KINECT麥克風陣列SDK融合的DTW語音辨識性能為84.62%,方法二的語音辨識方法Type-1 Fuzzy KINECT麥克風陣列TDOA融合的DTW語音辨識性能提升到91.4%,方法三的語音辨識方法Type-2 Fuzzy KINECT麥克風陣列TDOA融合的DTW語音辨識性能上升到92.52%,並且運用TDOA語者定位同時調整SVM自由參數C與自由參數γ,此方法平均辨識性能83.16%,相較於傳統使用列表法的平均辨識性能來的高。由實驗結果證實我們所提出同時的方法三種方法優於傳統使用單顆麥克風之辨識性能。

並列摘要


The dissertation is regarding the development of KINECT combining an speech recognition of the speaker location, as well as using the KINECT microphone array device developed by Microsoft. With the audio signal provided by microphone array and the basis of the speaker location, the speech recognition system is developed with the positioning efficiency. The dissertation is using KINECT microphone array speaker position, speaker location is divided into two, the first used by Microsoft of open source Software Development Kit, the method limits the detection range of angles, and distances can't be calculated, therefore, we proposed another method using Time Difference of Arrival, the method of detection angle is wider than KINECT SDK , and can calculate the distance. The dissertation is using KINECT microphone array-Based speaker localization for speech pattern recognition, the first one is the KINECT microphone array-based Speaker Localization for speaker verification, the second one is the KINECT microphone array-based Speaker Localization for voice recognition in order to improve recognition rate , the research added Type-1 Fuzzy Model and Type-2 Fuzzy Model for further identification system performance improvements. In the KINECT microphone array speaker localization for speaker verification, speaker verification method of this research using Support Vector Machine, the research of this method using Support Vector Machine, using decision fusion method approach to development and proposed three speaker verification method, method one for introducing Type-1 Fuzzy Systems in KINECT SDK microphone array speaker verification, fuzzy system input parameters for KINECT SDK the calculated angle, method two for introducing Type-1 Fuzzy Systems in KINECT TDOA microphone array speaker verification, fuzzy system input parameters for KINECT TDOA the calculated angle and distance, method three for introducing Type-2 Fuzzy Systems in KINECT TDOA microphone array speaker verification, fuzzy system input parameters for KINECT TDOA the calculated angle. In the KINECT microphone array speaker localization for voice recognition, voice recognition method of this research using Dynamics Time Warping, using decision fusion method approach to development and proposed three voice recognition method, method one for introducing Type-1 Fuzzy Systems in KINECT SDK microphone array voice recognition, fuzzy system input parameters for KINECT SDK the calculated angle, method two for introducing Type-1 Fuzzy Systems in KINECT TDOA microphone array voice recognition, fuzzy system input parameters for KINECT TDOA the calculated angle and distance, method three for introducing Type-2 Fuzzy Systems in KINECT TDOA microphone array voice recognition, fuzzy system input parameters for KINECT TDOA the calculated angle. The dissertation to further using TDOA speaker Location while adjusting SVM parameters C and parameter γ on SVM speaker verification, proposed to fuzzy system input parameters for KINECT TDOA the calculated angle and distance while the decision parameters C and γ. The efficiency of the SVM speaker verification rose from 52.5%, using the single microphone, to 62.5%, using the fusion method of KINECT SDK. The DTW speech recognition rose from 57.6%, using the single microphone, to 79.2%, using the fusion method of the KINECT SDK. The research proposed to KINECT microphone array speaker localization for speaker verification on study proposes three methods, method one the Introducing Type-1 fuzzy model using the fusion method of KINECT SDK. The SVM speaker verification rose from 88.99%, method two the Introducing Type-1 fuzzy model using the fusion method of KINECT TDOA. The SVM speaker verification rose from 90.99%,method three the Introducing Type-2 fuzzy model using the fusion method of KINECT TDOA. The SVM speaker verification rose from 93.99%,The research proposed to KINECT microphone array voice recognition for voice recognition on study proposes three methods, method one the Introducing Type-1 fuzzy model using the fusion method of KINECT SDK. The SVM speaker verification rose from 84.62%, method two the Introducing Type-1 fuzzy model using the fusion method of KINECT TDOA. The SVM speaker verification rose from 91.4%,method three the Introducing Type-2 fuzzy model using the fusion method of KINECT TDOA. The SVM speaker verification rose from 92.52%, and using TDOA speaker Location while adjusting SVM parameters C and parameter γ on SVM speaker verification, the method is 83.16% average recognition rates, the average recognition rate compared to the traditional list approach to high. From the experimental results, the proposed method exactly improved the recognition and is better than the previous using the original method.

參考文獻


[51] 歐大誠, “應用於語者確認之支撐向量機參數最佳化研究,” 國立虎尾科技大學電機工程學系碩士班碩士論文, 2013.
[28] 王乃堅, 李中富, “使用三麥克風到達時間差及空間幾何搜尋法達成三維聲音定位,” International Journal of Science and Engineering, vol. 4, no. 2, pp. 153-158, 2014.
[24] K. Kumatani, T. Arakawa, K. Yamamoto, J. McDonough, B. Raj, R. Singh and I. Tashev, “Microphone array processing for distant speech recognition: Towards Real-World Deployment,” Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 1-10, 2012.
[1] L. Rabiner and B. H. Juang, “Fundamentals of speech recognition,” Prentice Hall, New Jersey, 1993.
[5] T. Yamada, S. Nakamura and K. Shikano, “Distant-Talking speech recognition based on a 3-D viterbi search using a microphone array,” IEEE Transactions on Speech and Audio Processing, vol. 10, no. 2, pp. 48-56, 2002.

延伸閱讀