透過您的圖書館登入
IP:3.128.204.140
  • 學位論文

在分散式語音辨識中用於辨識遺失音框語音之錯誤隱藏解碼方法

An Error Concealment Decoding Method for Recognizing Speech with Missing Frames in Distributed Speech Recognition

指導教授 : 簡福榮
共同指導教授 : 李立民

摘要


在現今社會中,行動裝置中含有自動化語音辨識 (Automatic Speech Recognition) 功能已經十分普遍。在客戶端的分散式語音辨識(Distributed Speech Recognition)架構裡,語音特徵會在客戶端被提取及量化並傳送至伺服端進行辨識。語音特徵在透過網路傳送時,常常會發生傳輸錯誤的問題。在分散式語音辨識裡,語音特徵在容易出錯的通道傳輸時,會因為網路的延遲或傳輸錯誤等無法避免的問題,而產生音框遺失(missing frame)的現象。 在本篇論文中,為了減少因為音框遺失而造成效能下降,我們提出可信賴的降取樣率及適應性隱藏式馬可夫模型(adapted hidden Markov model)等錯誤隱藏解碼方法(RFR-MA),並與另一個利用線性間插(Linear Interpolation)將資料序列重建後的全音框率系統(FFR-INT)做比較。實驗結果證明,在分散式語音辨識系統中,本論文提出之方法(RFR-MA)可以達成與全音框率系統(FFR-INT)相同水平的辨識準確率,並減少可觀的辨識計算時間。

並列摘要


Nowadays it is very common to include automatic speech recognition (ASR) as a core component in the interface of mobile devices. In the client-server distributed speech recognition (DSR) system architecture, speech features are extracted and quantized at the user’s end (client end) and sent to a remote recognition server end for recognition. The transmission of speech feature data across networks between the two ends brings in problems of transmission errors. Speech features suffering from frame loss will be inevitable in the application of DSR over error prone channels, where the packets may be lost or discarded due to corruptions or delay. In this thesis, in order to reduce the performance degradation because of frame missing, an error concealment decoding method based on the most reliable reduced frame rate data and adapted hidden Markov model (HMM) is proposed (RFR-MA). The performance of the proposed method is compared to a baseline system, in which linearly interpolated FFR data sequence is used for back-end decoding (FFR-INT). Experimental results show that a DSR system using the RFR-MA method can achieve the same level of accuracy as the FFR-INT method and significantly lessens the computation time.

參考文獻


[1] B. Beek, E. P. Neuberg, and D. C. Hodge, “An Assessment of the Technology of Automatic Speech Recognition for Military Applications,” In Proc. IEEE, vol. ASSP-25, no. 4, 1977, pp. 310-322.
[2] R. V. Cox, C. A. Kamm and L. R. Rabiner, et al., “Speech and language processing for next-millennium communications services,” In Proc. IEEE vol. 88, no. 8, 2000, pp. 1314-1337.
[3] Z. H. Tan, P. Dalsgaard, and B. Lindberg, “Automatic speech recognition over error-prone wireless networks,” Speech Communication, vol. 47, 2005, pp. 220-242.
[4] A. M. Peinado, V. Sanchez, J. L. Perez-Cordoba, and A. de la Torre, “HMM-based channel error mitigation and its application to distributed speech recognition,” Speech Communication, vol. 41, 2003, pp. 549–561.
[6] C. Demiroglu and D. V. Anderson, “Two–sensor noise robust ASR with missing frames for Aurora2 tesk,” In Proc. IEEE, International Symposium on Circuits and Systems, 2004, pp. 113-116.

延伸閱讀