透過您的圖書館登入
IP:18.226.28.197
  • 學位論文

端點偵測技術在強健語音參數擷取之研究

Study on the Voice Activity Detection Techniques for Robust Speech Feature Extraction

指導教授 : 洪志偉

摘要


由於發展環境和應用環境兩者之間的不匹配,導致於語音辨識系統效能經常會下降,而引起這不匹配的主要原因之一是加成性雜訊,處理加成性雜訊的方法我們可以分成三類,語音強化法、強健性語音特徵參數、以及語音模型調適法,而本論文所討論的方法主要是屬於強健性語音特徵參數之技術。 在本論文中,我們主要的重點在於探討不同的語音特徵對於語音端點偵測的影響,所利用的特徵分別為低頻帶頻譜強度、全頻帶頻譜強度、累積量化頻譜、以及高通對數能量等。利用以上這些不同的特徵進行語音之端點偵測,所得之純雜訊的位置資訊可以提供頻譜消去法與靜音對數能量正規化法中所需的雜訊頻譜或能量的估測。 在實驗環境上我們採用Aurora2語料庫,在八種背景雜訊以及訊雜比0~20dB下做實驗。在第五章中所呈現的實驗數據與分析可證明以上所述的各種特徵顯然可用以有效的鑑別出一段語音中純雜訊部分與語音部分,使之後所使用的頻譜消去法與靜音對數能量正規化法等強健性語音特徵技術,得以明顯提升在雜訊環境下語音辨識的精確度,增加語音辨識系統的強健性。

並列摘要


The performance of a speech recognition system is often degraded due to the mismatch between the environments of development and application. One of the major sources that give rises to this mismatch is additive noise. The approaches for handling the problem of additive noise can be divided into three classes: speech enhancement, robust speech feature extraction, and compensation of speech models. In this thesis, we are focused on the second class, robust speech feature extraction. The approaches of speech robust feature extraction are often together with the voice activity detection in order to estimate the noise characteristics. A voice activity detector (VAD) is used to discriminate the speech and noise-only portions within an utterance. This thesis primarily investigates the effectiveness of various features for the VAD. These features include low-frequency spectral magnitude (LFSM), full-band spectral magnitude (FBSM), cumulative quantized spectrum (CQS) and high-pass log-energy. The resulting VAD offers the noise information to two noise-robustness techniques, spectral subtraction (SS) and silence log-energy normalization (SLEN), in order to reduce the influence of additive noise in speech recognition. The recognition experiments are conducted on Aurora-2 database. Experimental results show that the proposed VAD is capable of providing accurate noise information, with which the following processes, SS and SLEN, significantly improve the speech recognition performance in various noise-corrupted environments. As a result, we confirm that an appropriate selection of features for VAD implicitly improves the noise robustness of a speech recognition system.

參考文獻


[1] 王小川, "語音訊號處理" , 全華科技圖書, 2004.
[2] 黃志楠, "Improved Techniques for Speech Recognition Under Additive and Covolutional Noisy Environment" , 國立台灣大學碩士論文, June 2001.
[3] 呂麗如, "Improved Techniques for Continuous Mandarin Speech Recognition Under Telephone Environment" , 國立台灣大學碩士論文, June 1999.
[4] Yifan Gong, "Speech Recognition in Noisy Environments; A Survey", Speech Communication 16, 1995.
[5] S.F. Boll, “Suppression of Acoustic Noise in Speech Using Spectral Subtraction” IEEE Trans. on Acoustics, speech, and Processing, VOL. ASSP-27, NO. 2, April 1979.

延伸閱讀