分散式語音辨識之可變音框率音框選取

為了方便使用，在手持行動裝置上賦予語音辨識功能是很自然的演進現象。為了減少來自用戶端所擷取的語音特徵傳輸量，在分散式語音辨識(DSR)架構上產生了一個有趣的研究課題，稱為可變音框率(VFR)語音辨識。在本篇論文中，我們提出5種音框選取亦或捨去的方法進行研究，包括降取樣法(frame decimation method, FD)、最小距離方法(minimum distance method, MD)、降取樣結合最小距離方法(combined decimation and minimum distance method, CDAMD)、門檻值音框選取演算法(threshold-based frame selection algorithm, TBFS)、以及以合成做分析音框選取演算法(analysis-by-synthesis frame selection algorithm, ABSFS)。在伺服端隱藏音框遺失的補償方法則利用特徵參數間插重建法(feature interpolation)或模型適應法(model adaptation)。實驗結果顯示，在所有VFR方法中，CDAMD在1/2音框率有最好的語音辨識率，FD在1/3和1/4音框率有最好的語音辨識率，ABSFS則在1/5和1/10音框率能達到最好的語音辨識率。若採用模型適應法隱藏音框遺失，其辨識時間也隨著所選取的音框數等比例縮短。

關鍵字

分散式語音辨識；可變音框率；音框重建；隱藏式馬可夫模型

並列摘要

It’s a natural evolution for ease of use with speech recognition functions on mobile devices.In order to reduce the amount of speech features transmitted from end users to the server in distributed speech recognition (DSR) architecture yields an interesting research topic called variable frame rate (VFR) speech recognition. In this thesis, five VFR methods are investigated including frame decimation method (FD), minimum distance method (MD), combined decimation and minimum distance method (CDAMD), threshold-based frame selection algorithm (TBFS), and analysis-by-synthesis frame selection algorithm (ABSFS). At the server end for compensation on the loss of frames, either feature interpolation (FE) or model adaptation (MA) is adopted prior to decoding. Experimental results show that among all VFR methods, CDAMD performs the best at 1/2 frame rate, FD performs the best at 1/3 and 1/4 frame rates, and ABSFS achieves relatively good recognition rates at 1/5 and 1/10 frame rates. Meanwhile, recognition time is also reduced proportionally to the ratio of variable frame rate to full frame rate if model adaptation is adopted.

並列關鍵字

distributed speech recognition ； variable frame rate ； frame reconstruction ； hidden Markov model

參考文獻

[1]R.V. Cox, C.A. Kamm and L.R. Rabiner, et al., "Speech and language processing for next-millennium communications services," In Proc. IEEE vol.88, no. 8, 2000, pp. 1314-1337.

[2]L.S. Lee, Y. Lee, "Voice access of global information for broad-band wireless: technologies of today and challenges of tomorrow," In Proc. IEEE vol. 89, no. 1, 2001, pp. 41-57.

[3]L.R. Rabiner and B. Juang, Fundamental of Speech Recognition, Prentice Hall, 1993.

[4]R.C. Rose, I. Arizmendi, S. Parthasarathy, "An efficient framework for robust mobile speech recognition services," ICASSP, Hong Kong, China, 2003, pp 316-319.

[8]L.M. Lee, "Adaptation of hidden Markov models for half frame rate observations," IET Electronics. Lett., vol. 46, no. 10, 2010, pp. 723-724.

國際替代計量

分散式語音辨識之可變音框率音框選取

全文下載

主題瀏覽