時域封包上的雜訊消除

本論文所提出之方法是在單通道下對每個子頻帶在時域上的調變封包作雜訊消除，對輸入的語音訊號作了兩層的遮蔽，第一層是根據語音與雜訊能量大小的差異設計出了每個頻帶的第一個臨界值，將判定為非語音的音框作遮蔽，接著第二層則是將語音封包作了快速傅立葉轉換，根據語音與雜訊在調變振幅上的差異設計了每個頻帶的第二個臨界值，將上一層所誤判實際上則為非語音的部分再加以壓抑，為了使效果更加顯著，我們在系統前端加入在spectral上進行去雜訊的Wiener濾波器作為預強濾波器，因此整個系統為分別作在spectral與temporal上最後再結合的方法，其計算複雜度低，所需時間較少，未來可能應用於助聽器上。在後面的實驗評估裡，我們選了只作在spectral上的Wiener濾波器、本篇所提出的只作在temporal上的封包調變消噪法以及在spectro與temporal上同時處理的Joint spectro-temporal subband Wiener filter這三種方法作比較，採用的是客觀評分，分別為PESQ與IS distance，並且也將四種方法計算所需的時間也一併比較，所得之結果在客觀評分裡，Joint spectro-temporal subband Wiener filter的效能最好，其次是本篇所提出的封包調變消噪法與Wiener濾波器結合的系統，而Wiener濾波器與封包調變消噪法相比之下，在高斯白雜訊的環境裡，當訊雜比較高，Wiener濾波器的分數比較高，然而當訊雜比越差時，封包調變消噪法的分數會越來越接近Wiener濾波器，甚至在0 dB的情況時優於前者。

關鍵字

語音增強；聽覺感知；雜訊消除；時域封包

並列摘要

We propose a single-channel two-stage masking algorithm based on temporal modulations for noise reduction. The first masking stage is based on the temporal modulation energy and the second stage is based on the amplitude modulation of the input signal to distinguish speech from non-speech segments. The algorithm is developed under a filter bank structure with a frame-by-frame analysis paradigm. The pure temporal noise reduction algorithm is then combined with a conventional Wiener filter for further enhancement of speech. The whole system conducts noise reduction in spectral and temporal domain separately and it may be applied on the digital hearing-aid in the future since the computation complexity is low comparing with the complexity of the joint spectro-temporal subband Wiener filter. As for the performance comparison, we evaluate four systems in this thesis. They are: (1) the proposed pure temporal noise reduction algorithm, (2) a conventional Wiener filter, (3) a joint spectro-temporal subband Wiener filter and (4) the proposed temporal algorithm combined with the conventional Wiener filter. Objective measures of PESQ and IS distance are used in our evaluations. The system (4) outperforms system (1) and (2) and has slightly lower performance than the system (3). However system (4) can achieve the request of real-time process compare to the joint spectro-temporal subband Wiener filter.

並列關鍵字

Speech Enhancement ； Auditory Model ； Noise Reduction ； Temporal Modulation

參考文獻

[1] Tai-Shih Chi, Powen Ru and Shihab A. Shamma, “Multiresolution spectrotemporal analysis of complexsounds” J. Acoust. Soc. Am., vol. 118, no. 2, pp. 887-906, 2005.

[2] H. Sheikhzadeh, R. L. Brennan and H. Sameti, “Real-time implementation of HMM-based MMSE algorithm for speech enhancement in hearing aid applications” Proc. IEEE ICASSP, pp. 808-811, 1995.

[4] Nima Mesgarani and Shihab Shamma, “Denoising in the domain of spectrotemporal modulations” EURASIP Journal on Audio, Speech, and Music Processing Volume 2007.

[5] Tai-Shih Chi, Ting-Han Lin and Chung-Chien Hsu, “Spectro-temporal modulation energy based mask for robust speaker identification” J. Acoust. Soc. Am. 131 (5), pp. 368-374, 2012.

[6] Chung-Chien Hsu, Ting-Han Lin and Tai-Shih Chi, “FFT-based spectro-temporal analysis and synthesis of sounds” Proc. IEEE ICASSP, pp. 5388-5391, 2011.

國際替代計量

時域封包上的雜訊消除

全文下載

主題瀏覽