透過您的圖書館登入
IP:52.14.84.29
  • 學位論文

整合全卷積神經網路、時序低通濾波與時頻遮罩法之語音強化

Speech Enhancement Based on the Integration of Fully Convolutional Network, Temporal Lowpass Filtering and Spectrogram Masking

指導教授 : 洪志偉

摘要


在本研究中,我們注重於消除語音信號中噪聲所帶來的失真問題,並開發了兩種新穎的無監督語音強化演算法,包括時序低通濾波器(TLF)與時頻遮罩法(RMM)。這兩種方法都是在語音信號的頻域振幅上進行處理。 TLF使用簡單移動平均濾波器來強調語音信號的低調變頻率區域,這可使語音信號擁有更豐富的語言信息並表現出更高的信噪比(SNR)。相比之下,在RMM中,我們使用遮罩值以逐點方式直接與語音頻譜相乘,而這些所使用的遮罩值與它們所對應的幅度成正比。此研究是在TIMIT資料庫上所進行的,初步實驗顯示這兩種新方法可以有效的提高被雜訊干擾的語音品質,並且它們都能與近期眾所皆知的監督語音增強方法全卷積網路(FCN)形成良好的加成性,使降噪的能力更好並得到更清晰的語音品值。

並列摘要


In this study, we focus on the issue of noise distortion in speech signals, and develop two novel unsupervised speech enhancement algorithms including temporal lowpass filtering (TLF) and relative-to-maximum masking (RMM). Both of these two algorithms are conducted on the magnitude spectrogram of speech signals. TLF uses a simple moving-average filter to emphasize the low modulation frequencies of speech signals, which are believed to contain richer linguistic information and exhibit higher signal-to-noise ratios (SNR). Comparatively, in RMM we apply a mask that is directly multiplied with the speech spectrogram in a point-wise manner, and the used masking value is directly proportional to the magnitude of each temporal-frequency (T-F) point in the spectrogram. The preliminary experiments conducted on a subset of TIMIT database show that the two novel methods can promote the quality of noise-corrupted speech signals significantly, and both of them can be integrated with a well-known supervised speech enhancement scenario, namely fully convolutional network, to achieve even better perceptual speech quality values.

參考文獻


[1]D. O' Shaughnessy, "Speech communications: human and machine," 2nd ed., Hyderabad, India: University Press (India) Pvt. Ltd., 2007.
[2]Y. Ephraim, H. L. Ari and W. Roberts, "A brief survey of speech enhancement," Electrical Engineering Handbook, 3rd ed. Boca Raton, FL: CRC, 2006.
[3]P. C. Loizou, "Speech enhancement: theory and practice," Taylor and F. Group, Eds. Boca Raton, FL, USA: CRC Press, 2013.
[4]R. Martin, "Spectral subtraction based on minimum statistics," In Proc. European Conference Signal Processing, pp. 1182–1185, 1994.
[5]P. Krishnamurthy, and S. R. M. Prasanna, "Modified spectral subtraction method for enhancement of noisy speech," in Proc. International Conference Signal, Image Processing, pp.146-150, Dec. 2005.

延伸閱讀