透過您的圖書館登入
IP:216.73.216.9
  • 學位論文

加成性雜訊環境下運用特徵參數統計補償法於強健性語音辨識

Feature Statistics Compensation for Robust Speech Recognition in Additive Noise Environments

指導教授 : 洪志偉

摘要


自動語音辨識研究上,該如何有效地降低背景雜訊的影響一直是研究重點,而這類的研究在語音辨識研究歷史上也已有相當多的改善方法被提出來,如整段式倒頻譜平均與變異數正規化法(U-CMVN)以及分段式倒頻譜平均與變異數正規化法(S-CMVN)即屬於此類。這兩種在傳統的強健性語音辨識上,常被用來降低雜訊影響的特徵參數正規化法,主要是以整段語句或片段語句為統計值正規化基礎之特徵參數等化技術,然而它們在統計值的估算上並非相當準確,且無法以線上方式去執行。 本論文中,我們建立兩組分別代表訓練語音與測試語音的碼簿來代替U-CMVN與S-CMVN這兩種以段落方式做統計值的估算,我們稱之為虛擬雙通道碼簿。以虛擬雙通道為基礎我們發展出三種特徵參數補償法:倒頻譜統計補償法(cepstral statistics compensation,CSC)、線性最小平方回歸法(linear least squares regression,LLS)與二次最小平方回歸法(quadratic least squares regression,QLS)。我們介紹藉由碼簿求得代表訓練語音與測試語音的統計值,進而執行這三種特徵參數補償法來強健語音訊號、提升辨識效果。這些方法不但簡單且實驗效果更好,並且能夠以線上的方式執行。 我們將這三種方法作用於四種不同類型的倒頻譜特徵參數上,包含梅爾頻率倒頻譜係數(MFCC)、自相關梅爾頻率倒頻譜係數(AMFCC)、線性預測倒頻譜係數(LPCC)以及感知線性預測倒頻譜係數(PLPCC)。實驗方面我們採用AURORA-2語料庫,實驗結果顯示在各種語音特徵參數中,我們提出的這三種方法在各種雜訊環境下,會更促進實驗效果的提升。此外,與傳統的U-CMVN與S-CMVN比較,這三種方法將提供更好的辨識率。

並列摘要


To improve the accuracy of a speech recognition system under a mismatched noisy environment has always been a major research issue in the speech processing area. A great amount of approaches have been proposed to reduce this environmental mismatch, and one class of these approaches focuses on normalizing the statistics of speech features under different noise conditions. The well-known utterance-based cepstral mean and variance normalization (U-CMVN) and segmental cepstral mean and variance normalization (S-CMVN) both belong to this class. Both of them make use of the whole utterance or segments of an utterance to estimate the statistics, which may be not accurate enough, and they cannot be implemented in an on-line manner. In the thesis, instead of estimating the statistics in an utterance-wise manner as in U-CMVN and S-CMVN, we construct two set of codebooks, called pseudo stereo codebooks, which represent the speech features in clean and noisy environments, respectively. Then based on pseudo stereo codebooks, we develop three feature compensation approaches, i.e., cepstral statistics compensation (CSC), linear least squares (LLS) regression, and quadratic least squares (QLS) regression. These new approaches are simple yet very effective. Online implementation of them is achievable. We perform the proposed three approaches on four different types of cepstral features, including mel-frequency cepstral coefficients (MFCC), auto-correlation mel-frequency cepstral coefficients (AMFCC), linear prediction cepstral coefficients (LPCC) and perceptual linear prediction cepstral coefficients (PLPCC). Experiments conducted on the Aurora-2 database show that for each type of speech features, the proposed three approaches bring about very encouraging performance improvements under various noise environments. Besides, compared with the traditional utterance-based CMVN and segmental CMVN, the three approaches provide further improved recognition accuracy.

參考文獻


[1] 王小川,”語音訊號處理”,全華科技圖書,2004
[2] C.J. Leggetter,P.C. Woodland,”Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models”,Computer Speech and Language,1995
[3] Y.Gong, “Speech Recognition in Noisy Environments:A Survey Speech Communication 16,1995”
[4] M.J.F Gales, “Model-based Technique for Noise Robust Speech Recognition”, University of Cambridge.
[5] S.F, Boll, “Suppression of Acoustic Noise in Speech Using Spectral Subtraction”, IEEE Trans on ASSP, Vol.27,NO.2,pp.113-120-1979

被引用紀錄


張智傑(2014)。多種語音特徵的合併及其在智慧型手機上之應用〔碩士論文,國立中央大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0031-0412201511582064
唐曲亮(2015)。改良式梅爾倒頻譜係數混合多種語音特徵之研究〔碩士論文,國立中央大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0031-0412201512055340

延伸閱讀