基於隱藏式隨機域聲學模型之麥克風陣列波束成形演算法

本論文提出一個基於隱藏式隨機域(Hidden Conditional Random Field, HCRF)聲學模型的適應性(Adaptive)麥克風陣列波束成形演算法(Microphone Array Beamformer)來降低殘響(Reverberation)環境中的語音辨認錯誤率(Speech Recognition Error Rate)。此演算法在調適階段(Adaptation Phase)以一段已知發音內容的調適語句配合最大相似度(Maximum Likelihood)法則調適陣列濾波器參數，並藉由隱藏式隨機域模型計算語音特徵參數的相似度(Likelihood)值。在測試階段(Testing Phase)，使用錯誤率(Error Rate)做為量測系統表現的依據(Performance Measure)。殘響會破壞音框間互為獨立的假設，使語音辨認系統的辨認錯誤率大幅上升。傳統用來消除殘響的陣列波束成形演算法以訊號殘響比(Signal to Reverberation Ratio，SRR)為量測系統表現的依據，然而語音辨認系統卻是以錯誤率為依據。有鑑於此，Seltzer連結陣列波束成形演算法與語音辨認系統成為以語音辨認驅動 (Speech Recognition-Driven) 之陣列波束成形演算法。Seltzer的波束成形演算法以隱藏式馬可夫模型(Hidden Markov Model, HMM)計算語音特徵參數的相似度值，而我們則進一步使用隱藏式隨機域。本論文的實驗使用總共29個語者的中文人名語料庫，每個語者包含120句話。單頻道的中文人名語料庫與RWCP(Real World Computing Partnership)房間脈衝響應資料庫模擬出多組不同殘響長度的麥克風陣列語料。每個語者使用5或20句調適陣列濾波器，剩下的作為測試系統效能用。每個語者的調適以及測試是獨立的。根據實驗結果，在相同的調適遞迴次數下，基於HCRF的波束成形演算法的錯誤率比起基於HMM的波束成形演算法的錯誤率最多少了百分之25。而在到達相同的收斂條件時，基於HCRF的波束成形演算法所需遞迴次數最多可比基於HMM的波束成形演算法少百分之89。

關鍵字

隱藏式隨機域；麥克風陣列；波束成形

並列摘要

In this paper, we propose a Hidden Conditional Random Fields (HCRF) -Based adaptive microphone array beamformer to lower the error rate of speech recognition in reverberant environments. The proposed beamformer utilizes utterances with known transcription and Maximum Likelihood Estimation criterion to adapt the array filter in adaptation phase, whereby a set of HCRF speech models is used to calculate the likelihood of a sequence of acoustic features. In testing phase, we use error rate as the performance measure. The error rate of speech recognition raises dramatically in reverberant environments since reverberation can break the assumption of independency between short-term speech frames. Traditional de-reverberation array beamformer use SRR (Signal to Reverberation Ratio) as the performance measure. However, error rate is the performance measure of speech recognition instead. Therefore, Seltzer combined the array beamformer and speech recognition into a speech recognition-driven array beamformer where the feature sequence of the reverberant speech will generate the largest likelihood to the correct hypothesis, in another word, lower error rate. Seltzer’s beamformer use HMM (Hidden Markov Model) to calculate the likelihood of a sequence of acoustic features, where we further improve it by using HCRF. The experiments in this paper use the Chinese Name speech corpora with total 29 speakers and each of them consisting of 120 utterances. The single channel corpora convolved with the RWCP (Real World Computing Partnership) room impulse database to simulate multiple sets of reverberant array corpora in different reverberation time. 5 or 20 utterances of a speaker will be used to adapt the array filter, while the remaining is used to test the performance of the system. The adaptation and testing process of each speaker is independent. According to the experiment result, the recognition error rate of HCRF-based array beamformer reduced by at most 25 percent compared to the HMM-based array beamformer under the same iteration. Moreover, the HCRF-based array beamformer need at most 89 percent less iterations than the HMM-based array beamformer to reach the same convergence boundary.

並列關鍵字

Hidden Conditional Random Field ； Microphone Array ； Beamformer

參考文獻

[1] M. Wolfel and J. McDonough, Distant Speech Recognition. Wiley, 2009.

[2] M. L. Seltzer, Microphone Array Processing for Robust Speech Recognition. PhD thesis, Carnegie Mellon University, July 2003.

[3] S. Ganapathy, J. Pelecanos, and M. K. Omar, "Feature normalization for speaker verification in room reverberation," Proc. IEEE Int Acoustics, Speech and Signal Processing (ICASSP) Conf, pp. 4836 4839, 2011.

[4] N. R. Shabtai, B. Rafaely, and Y. Zigel, "The effect of reverberation on the performance of cepstral mean subtraction in speaker verification," Applied Acoustics, vol. 72, pp. 124-126, 2011.

[5] L. Wang, A Study on Hands-Free Speech/Speaker Recognition. PhD thesis, Toyohashi University of Technology, 2008.

被引用紀錄

魏輔辰（2012）。倒頻譜域麥克風陣列波束成形之語音辨認研究〔碩士論文，元智大學〕。華藝線上圖書館。https://doi.org/10.6838/YZU.2012.00278

國際替代計量

基於隱藏式隨機域聲學模型之麥克風陣列波束成形演算法

全文下載

主題瀏覽