深層遞迴類神經網路之正規化及聲學模型之建立

深層學習在不同分類評估系統已證實獲得極高的分類正確率，深層類神經網路已成為現今語音辨識領域中的熱門研究議題。本篇論文發展具新穎性之遞迴類神經網路(Recurrent Neural Network)之正規化(Regularization)並建立深層聲學模型(Acoustic Model)於雜訊語音辨識系統。我們的方法是在深層類神經網路的預訓練 (Pre-training)過程中加入提克洛夫正規化(Tikhonov regularization)。此想法是藉由補償類神經網路系統受輸入語音資料變異的影響，使系統效能較具強健性，尤其在以限制型波茲曼模型(Restricted Boltzmann Machine)的預訓練過程，我們進行特徵學習及深層聲學模型訓練，透過提克洛夫正規化建立起若干模型不變性(Invariance)之特性。在限制型波茲曼模型中，我們更結合以權重衰減(Weight Decay)為主的正規化法則，使用這種正規化的結合機制可以有效增加在交替式訓練馬可夫鏈(Gibbs Markov Chain)的混合率並使對比散度(Contrastive Divergence)更接近最大相似度(Maximum Likelihood)學習。另外，我們也提出將倒傳遞累積時間法(Backpropagation Through Time, BPTT)延伸應用在遞迴類神經網路中遞迴參數及隱藏層與遞迴層間參數的模型訓練。在實驗評估中，我們使用卡爾迪(Kaldi)深層類神經網路語音辨識軟體實現本論文提出的演算法，在Resource Management及Aurora4語音資料庫的實驗結果發現，雙重正規化(Hybrid Regularization)法及倒傳遞累積時間法(BPTT)的確可以提升深層類神經網路聲學模型之強健性及其語音辨識率。

關鍵字

模型正規化；深層學習；遞迴類神經網路；聲學模型；語音辨識

並列摘要

Deep learning has been widely demonstrated to achieve high performance in many classification tasks. Deep neural network is now a new trend in the areas of automatic speech recognition. In this dissertation, we deal with the issue of model regularization in deep recurrent neural network and develop the deep acoustic models for speech recognition in noisy environments. Our idea is to compensate the variations of input speech data in the restricted Boltzmann machine (RBM) which is applied as a pre-training stage for feature learning and acoustic modeling. We implement the Tikhonov regularization in pre-training procedure and build the invariance properties in acoustic neural network model. The regularization based on weight decay is further combined with Tikhonov regularization to increase the mixing rate of the alternating Gibbs Markov chain so that the contrastive divergence training tends to approximate the maximum likelihood learning. In addition, the backpropagation through time (BPTT) algorithm is developed in modified truncated minibatch training for recurrent neural network. This algorithm is not implemented in the recurrent weights but also in the weights between previous layer and recurrent layer. In the experiments, we carry out the proposed methods using the open-source Kaldi toolkit. The experimental results using the speech corpora of Resource Management (RM) and Aurora4 show that the ideas of hybrid regularization and BPTT training do improve the performance of deep neural network acoustic model for robust speech recognition.

並列關鍵字

Tikhonov regularization ； deep learning ； recurrent neural network ； acoustic model ； speech recognition

參考文獻

[1] G. Saon and J.T. Chien, “Large-vocabulary continuous speech recognition systems: a look at some recent advances,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 18–33, 2012.

[2] J. Baker, L. Deng, J. Glass, S. Khudanpur, C.H. Lee, N. Morgan, and D. O’Shaughnessy, “Developments and directions in speech recognition and understanding, part 1,” IEEE Signal Processing Magazine, vol. 26, no. 3, pp. 75–80, 2009.

[3] L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, pp. 257–286, 1989.

[4] C. M. Bishop, Pattern Recognition and Machine Learning. Springer-Verlag New

York, Inc., 2006.

國際替代計量

深層遞迴類神經網路之正規化及聲學模型之建立

全文下載

主題瀏覽