透過您的圖書館登入
IP:18.226.169.94
  • 學位論文

語音調變頻譜強化之研究

A study of modulation spectrum enhancement for speech signals

指導教授 : 洪志偉

摘要


一套語音處理與應用系統,常會因為周遭環境的各種干擾因素,例如:加成性雜訊、語者特性變異、傳輸通道不匹配的效應等,使得接收到的語音訊號或其轉換而得的語音特徵產生嚴重失真,進而使得系統的效能變差,例如:語音辨識精確度下降、輸出語音品質低落等。而本論文將著重於降低上述的干擾,提出了一系列具有雜訊強健性的語音特徵擷取技術,其研究主軸分別敘述如下: (1)藉由統計正規化(histogram equalization)與成分分析 (factor analysis) 等技術,對語音之複數聲學頻譜的音框時序列加以補償與更新。 (2)沿線性聲學頻率對於語音頻譜上做帶通濾波,進而將各通帶之時頻圖其時序調變頻譜加以補償,並探討各子通帶對語音辨識的重要性。 (3)突破統計正規化法的基本假設,進而提出能同時對語音特徵之時間域與空間域上的結構統計資訊加以正規化的演算法。 本論文用以評估各方法之語音辨識實驗,皆運作於國際通用的Aurora-2連續數字語料庫與Aurora-4的大字彙語料庫。相較於使用梅爾倒頻譜係數(Mel-frequency cepstral coefficients, MFCC)之基礎實驗而言,本論文所提出的方法皆能顯著降低詞錯誤率,且其辨識效能皆與當前通用的強健性技術並駕齊驅或超越之,其中包含了著名的先進式前端特徵擷取法(advanced front-end, AFE)。

並列摘要


The environmental mismatches caused by background noise, acoustical variations among different speakers and channel distortion often seriously impair the performance of the state-of-the-art speech recognition systems. In this dissertation, we have proposed a family of novel noise robustness algorithms to remedy this problem, and the general line of this research is divided into three significant aspects. First, the temporal series of the complex-valued acoustic spectra are compensated via statistics normalization and factor analysis algorithms, including histogram equalization (HEQ) and nonnegative matrix factorization (NMF). Second, we apply HEQ to the modulation domain of the sub-band acoustic spectrum. The band-pass filters are exploited to divide the acoustic spectrogram into several sub-bands, and then temporal modulation spectra for each sub-band spectrogram are compensated via HEQ. By doing so, we can evaluate the relative importance of different sub-band spectrograms in speech recognition. Finally, we explore an HEQ-based feature normalization framework with sub-band division along the cepstral and/or temporal axes, which can not only normalize the overall histograms of feature vector components but also normalize their local contextual (or structural) statistics both spatially and temporally. All evaluation experiments are carried out on the Aurora-2 database and task, and are further validated on the Aurora-4 database and task. The experimental results suggest that our proposed methods can offer substantial improvements over the baseline system and achieve performance competitive to or better than some of the existing noise robustness methods, including the well-known advanced front-end (AFE) extraction scheme in speech recognition.

參考文獻


[1] A. Hurmalainen, J. Gemmeke and T. Virtanen, “Non-negative matrix deconvolution in noise robust speech recognition,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4588-4591, 2011.
[2] A. Stark, K. Wójcicki, J. Lyons and K. Paliwal, “Noise-driven short-time phase spectrum compensation procedure for speech enhancement,” in Proc. Annual Conference of the International Speech Communication Association (Interspeech), pp. 549-552, 2008.
[3] A. Torre, A. M. Peinado, J. C. Segura, J. L. Perez-Cordoba, M. C. Benitez and A. J. Rubio, “Histogram equalization of speech representation for robust speech recognition,” IEEE Trans. on Speech and Audio Processing, vol. 13, no. 3, pp. 355-366, 2005.
[4] B. Chen and S.-H. Lin, “Distribution-based feature compensation for robust speech recognition,” in: The book“Recent Advances in Robust Speech Recognition Technology,” edited by Ramez J, Griz JM, Segura J and Bentham Science Publishers, 2010.
[5] B. Chen, K.-Y. Chen, P.-N Chen and Y.-W. Chen, “Spoken document retrieval with unsupervised query modeling techniques,” IEEE Trans. on Audio, Speech, and Language Processing, vol. 20, no. 9, pp. 2602-2612, 2012.

延伸閱讀