語音辨識使用統計圖等化方法

統計圖等化法（Histogram Equalization, HEQ）是一種概念簡單且有效的語音特徵處理技術，近年來被廣泛地研究與應用於強健性語音辨識的領域。在本論文中，我們延續統計圖等化法的研究，提出一系列使用語音特徵的空間－時間之文脈統計資訊（Spatial-Temporal Contextual Statistics）的語音特徵強健方法；其作法是在語音之倒頻譜特徵上，利用一個簡易的差分（Differencing）和平均（Averaging）的處理方式，來得到語音特徵之文脈統計資訊後予以正規化並結合。這些新方法的作法有別於傳統之個別維度獨立正規化（Dimension-Wise）的統計圖等化法，進一步地正規化不同空間與時間之間的特徵分布資訊，因此可以降低不同聲學環境所產生的偏差，並且嘗試消除傳統之統計圖等化法無法補償的問題，亦即隨機性雜訊（Random Noise）對語音所產生的影響。本論文所有的語音辨識實驗皆是作用於國際通用的連續語音語料庫Aurora-2上；實驗結果顯示，我們所提出之方法相較於許多著名的特徵強化法，皆有不錯的效果。

關鍵字

語音辨識；雜訊強健性；統計圖等化法；特徵文脈的統計

並列摘要

Histogram equalization (HEQ) of speech features has received considerable attention in the field of robust speech recognition due to its simplicity and excellent performance. This paper is a continuation of this general line of research, presenting a novel HEQ-based feature normalization framework which takes advantage of joint equalization of spatial-temporal contextual statistics of speech features. In doing so, we explore the use of simple differencing and averaging operations to capture the contextual statistics of feature vector components for speech feature normalization. All experiments are conducted on the Aurora-2 database and task. Experimental results show that for clean-condition training, the methods instantiated from this framework achieve considerable word error rate reductions over the baseline system, which are indeed quite comparable to other conventional methods.

並列關鍵字

Speech Recognition ； Noise Robustness ； Histogram Equalization ； Feature Contextual Statistics

參考文獻

Acharya, T.,Ray, A. K.(2005).Image processing: principles and applications.Wiley-Interscience.

Google Scholar

Atal, B. S.(1974).Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification.Journal of the Acoustical Society of America.55,1304-1312.

Google Scholar

Boll, S. F.(1979).Suppression of acoustic noise in speech using spectral subtraction.IEEE Transactions on Acoustics, Speech and Signal Processing.27(2),133-120.

Google Scholar

Chen, B.,Chen, W. H.,Lin, S. H.,Chu, W. Y.(2011).Robust speech recognition using spatial-temporal feature distribution characteristics.Pattern Recognition Letters.32(7),919-926.

Google Scholar

Chen, C. P.,Bilmes, J.,Kirchhoff, K.(2002).Low-resource noise-robust feature post-processing on Aurora 2.0.7th International Conference on Spoken Language Processing (ICSLP).(7th International Conference on Spoken Language Processing (ICSLP)).

Google Scholar

國際替代計量

語音辨識使用統計圖等化方法

全文下載

主題瀏覽