透過您的圖書館登入
IP:3.19.30.232
  • 學位論文

倒頻譜域麥克風陣列波束成形之語音辨認研究

Research of A Cepstral Domain Array Beamformer for Speech Recognition

指導教授 : 洪維廷

摘要


傳統上麥克風陣列波束成形(Beamforming)搭配適應性濾波處理殘響時,多以針對 訊號波形做最佳化,而不是針對語音辨認器做最佳化。本論文提出一個倒頻譜域 麥克風陣列波束成形演算法,將傳統針對訊號波形最佳化為目標調整適應性濾波 器,改以使輸入訊號與語音模型的相似度最佳化為目標,使用語音辨認法則來調 整適應性濾波器的參數,並將調適階段與測試階段使用的語音模型統一在倒頻譜 域下,使陣列濾波器與語音辨認器間的差距降到最小,以期能夠在語音訊號有殘 響干擾時能得到最佳的語音辨認效果。實驗結果在相同的收斂停止條件下,倒頻 譜域下HCRF的錯誤率最多能比HMM少8.16%。在平均單次遞迴時間方面,倒頻 譜域下HCRF所需的時間最多可以比HMM少3.89%。

並列摘要


Generally,microphone array and adaptive beamforming are used for erase reverberation by optimize the signal waveform,instead of optimize the parameters that used for recognition. In this thesis, we propose a Cepstral Domain Array Beamformer for Speech Recognition,we change the system goal from optimize the signal waveform to optimize the likelihood between the signal and the speech model. We use the speech recognition rule to adapt the parameters of the lters and use only one speech model in cepstral domain for training phase and testing phase. The purpose of this thesis is to investigate the improvement by using the cepstral domain models for training phase and testing phase.

參考文獻


[20] W.-T. Hong, “Minimum classi_cation error training of hidden conditional random fields for speech and speaker recognition,” revised in Journal of Information Science and Engineering, 2012.
[2] B. Gillespie and L. E. Atlas, “Acoustic diversity for improved speech recognitionin reverberant environments,” Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 557-560, 2002.
[3] M. L. Seltzer, Microphone Array Processing for Robust Speech Recognition. PhD thesis, Carnegie Mellon University, July 2003.
[5] J. La_erty, A. McCallum, and F. Pereira, “Conditional random fields: probabilistic models for segmenting and labeling sequence data,” in Proceeding of the 18th International Conference on Machine Learning, pp. 282-289, 2001.
[6] M. J. F. Gales, “Discriminative models for speech recognition,” in Proceeding of Information Theory and Applications Workshop, pp. 170-176, 2007.

延伸閱讀