門檻值去噪法於調變頻譜之強健性語音辨識研究

本論文提出了一個新的雜訊強健性技術來增加語音特徵在雜訊環境下的辨識率。在這個提出的演算法中，時間序列域的特徵會藉由DCT 或是DFT 轉換到各自的頻域，接著利用門檻值來消去較小的部分，最後再把特徵從調變頻譜轉回時間序列域得到新的特徵。這個方法具有兩個優點，第一個是整個補償過程屬於非監督式，不需要額外關於噪聲的資訊；第二點，門檻值的設定非常彈性，並非只有一種可以選擇。透過國際通用Aurora-2 連續數字語料庫和其結果顯示，提出的方法在經過任何特徵預處理的統計正規法上都可以帶來顯著的辨識率提升，如CMVN、MVA、CGN和HEQ。DFT 的實驗結果普遍都比DCT 較好，但我們更進一步發現，使用DCT 的方法中，只需要補償低頻部分就能得到跟補償全頻相似甚至更好的效能，因此不論是使用DCT 還是DFT 的方法都十分具有利用的價值。

關鍵字

強健性語音辨識；門檻值去噪；離散餘弦轉換；離散傅立葉轉換；調變頻譜

並列摘要

This paper presents a novel noise robustness algorithm to enhance speech features in noisy speech recognition. In the presented algorithm, the temporal speech feature sequence is first converted to its spectrum via discrete cosine transform (DCT) or discrete Fourier transform (DFT), and then the DCT or DFT-based spectrum is compensated by a thresholding function in order to further shrink the smaller portion. Finally, the updated spectrum is converted back to the temporal domain to obtain the new feature sequence. The method have two advantages: The first is that the overall compensation process is unsupervised that no information about noise in speech signals is required. The second is that the used threshold can be decided with various optimization criteria flexibly.The experiment evaluation performed on the Aurora-2 connected digit database and task reveals that the presented methods can provide significant improvement in recognition accuracy to the speech features pre-processed by any of the statistics normalization algorithms, including cepstral mean and variance normalization (CMVN), CMVN plus ARMA filtering (MVA), cepstral gain normalization (CGN) and histogram equalization (HEQ). The DFT-based thresholding methods achieve better performance than the DCT-based ones, but we further showed that, using the DCT-based methods, simply compensating the low frequency portion gives similar performance on a par with that achieved by compensation over the entire frequency band. As a result, both the DCT- and DFT-based compensation methods are quite effective in enhancing noise robustness of speech features.

並列關鍵字

robust speech recognition ； threshold denoising ； discrete cosine transform ； discrete Fourier transform ； modulation spectrum

參考文獻

[1] S. Boll, “Suppression of acoustic noise in speech using spectral subtraction”, IEEE

Google Scholar

Transactions on Acoustics, Speech and Signal Processing, 27(2), pp. 113–120, 1979.

Google Scholar

[2] M. Berouti, R. Schwartz, and J. Makhoul, “Enhancement of speech corrupted by

Google Scholar

acoustic noise”, IEEE International Conference on Acoustics, Speech and Signal

Google Scholar

Processing (ICASSP), pp. 208-211, 1979.

Google Scholar

被引用紀錄

王琦瑜（2006）。臺北市國中生課業壓力與休閒需求之相關研究〔碩士論文，國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-0712200716123317

孫謹杓（2006）。北部技專校院教師休閒需求、休閒參與及滿意度之研究〔碩士論文，國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-2304200713213529

顧彩媛（2007）。休閒運動對國小學童體適能及休閒運動態度影響之研究〔碩士論文，亞洲大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0118-0807200916282027

張至敏（2007）。桃園縣民眾參與視覺藝術活動休閒需求及阻礙之研究〔碩士論文，國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-0204200815531832

吳瑞祥（2010）。國中生校外教學活動休閒需求與滿意度之研究-以三鶯樹地區國中為例〔碩士論文，國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-1610201315193711

國際替代計量

門檻值去噪法於調變頻譜之強健性語音辨識研究

全文下載

主題瀏覽