本論文分析與改進本實驗室改自HTK (hidden Markov model toolkit) 的整數版辨識引擎所使用的MFCCs (Mel-Frequency Cepstral Coefficients)特徵擷取。 我們提出三種改進方式:首先我們換了FFT (fast Fourier transform) 演算法,再來FFT所產生的功率頻譜和Mel-filter bank我們改採用對數值的表示方式,最後我們用兩倍長度的整數以增加精準度來改善乘法運算方式。 實驗結果顯示以上所提出的方法對於整數運算的精準度還有辨識率相較於原本的整數系統都有改善(2~3%)。而進一步測試Viterbi階段時發生的溢位 還有 背景噪音 的影響後,我們發現這些對於辨識率並無直接相關性。
This thesis analyzes and improves on the accuracy of the MFCCs (Mel-Frequency Cepstral Coefficients) feature extraction currently used in our lab’s fixed-point ASR (automatic speech recognition) system based on HTK (hidden Markov model toolkit). We propose three methods for improvement: first by changing the FFT (fast Fourier transform) algorithm, then by using a logarithmic representation for the power spectrum after FFT and the Mel-filter bank, and lastly we improve the method for multiplication by using double-length integers to achieve higher precision. Experimental results shows that each of the above methods yields an improvement in both fixed-point computation precision and recognition rates (by 2~3%) over the original fixed-point system. Further experiments on the effects of overflow at the Viterbi stage and background noise show no correlation of these effects with recognition rates.