32位元嵌入式語音辨識系統之改進

本論文分析與改進本實驗室改自HTK (hidden Markov model toolkit) 的整數版辨識引擎所使用的MFCCs (Mel-Frequency Cepstral Coefficients)特徵擷取。我們提出三種改進方式:首先我們換了FFT (fast Fourier transform) 演算法，再來FFT所產生的功率頻譜和Mel-filter bank我們改採用對數值的表示方式，最後我們用兩倍長度的整數以增加精準度來改善乘法運算方式。實驗結果顯示以上所提出的方法對於整數運算的精準度還有辨識率相較於原本的整數系統都有改善(2~3%)。而進一步測試Viterbi階段時發生的溢位還有背景噪音的影響後，我們發現這些對於辨識率並無直接相關性。

關鍵字

嵌入式；特徵擷取；語音辨識

並列摘要

This thesis analyzes and improves on the accuracy of the MFCCs (Mel-Frequency Cepstral Coefficients) feature extraction currently used in our lab’s fixed-point ASR (automatic speech recognition) system based on HTK (hidden Markov model toolkit). We propose three methods for improvement: first by changing the FFT (fast Fourier transform) algorithm, then by using a logarithmic representation for the power spectrum after FFT and the Mel-filter bank, and lastly we improve the method for multiplication by using double-length integers to achieve higher precision. Experimental results shows that each of the above methods yields an improvement in both fixed-point computation precision and recognition rates (by 2~3%) over the original fixed-point system. Further experiments on the effects of overflow at the Viterbi stage and background noise show no correlation of these effects with recognition rates.

並列關鍵字

MFCC ； Embedded ； Feature Extraction ； Voice recognition ； ASRA ； FFT

參考文獻

[2] J. W. Cooley and J. W. Tukey, "An algorithm for the machine calculation of complex Fourier series," Math. Comp., vol. 19, pp. 297-301, 1965.

[4] J. W. Cooley, P. W. Lewis and P. D. Welch, "The fast Fourier transform algorithm: Programming considerations in the calculation of sine, cosine and Laplace transforms," Journal of Sound and Vibration, vol. 12, no. 3, pp. 315-337, 22 8 1969.

[6] N. M. Brenner, "Fast Fourier Transform of Externally Stored Data," IEEE Transactions on Audio and Electroacoustics, pp. 128-132, 6 1969.

[11] C.-J. Huang and J.-S. R. Jang, "On the Improvement of Embedded Speech Recognition," NTHU Master Thesis, June 2009.

[12] D. Huggins-daines, M. Kumar, A. Chan, A. W. Black, M. Ravishankar and A. I. Rudnicky, "PocketSphinx: A free, real-time continuous speech recognition system for hand-held devices," Proceedings of ICASSP, 2006.

被引用紀錄

劉承泰（2013）。嵌入式語音命令系統的設計與改進〔碩士論文，國立清華大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0016-2511201311364897

國際替代計量

32位元嵌入式語音辨識系統之改進

全文下載

主題瀏覽