Have library access?
IP:100.28.227.63
  • Journals
  • OpenAccess

融合多任務學習類神經網路聲學模型訓練於會議語音辨識之研究

Leveraging Multi-Task Learning with Neural Network Based Acoustic Modeling for Improved Meeting Speech Recognition

Abstracts


本論文旨在研究如何融合多任務學習(Multi-Task Learning, MTL)技術於聲學模型之參數估測,藉以改善會議語音辨識(Meeting Speech Recognition)之準確性。我們的貢獻主要有兩點:1)我們進行了實證研究以充分利用各種輔助任務來加強多任務學習在會議語音辨識的表現。此外,我們還研究多任務與不同聲學模型像是深層類神經網路(Deep Neural Networks, DNN)聲學模型及摺積神經網路(Convolutional Neural Networks, CNN)結合的協同效應,期望增加聲學模型建模之一般化能力(Generalization Capability);2)由於訓練多任務聲學模型的過程中,調整不同輔助任務之貢獻(權重)的方式並不是最佳的,因此我們提出了重新調適法,以減輕這個問題。我們基於在台灣所收錄的中文會議語料庫(Mandarin Meeting Recording Corpus, MMRC)建立了一系列的實驗。與數種現有的基礎實驗相比,實驗結果揭示了我們所提出的方法之有效性。

Parallel abstracts


This paper sets out to explore the use of multi-task learning (MTL) techniques for more accurate estimation of the parameters involved in neural network based acoustic models, so as to improve the accuracy of meeting speech recognition. Our main contributions are two-fold. First, we conduct an empirical study to leverage various auxiliary tasks to enhance the performance of multi-task learning on meeting speech recognition. Furthermore, we also study the synergy effect of combing multi-task learning with disparate acoustic models, such as deep neural network (DNN) and convolutional neural network (CNN) based acoustic models, with the expectation to increase the generalization ability of acoustic modeling. Second, since the way to modulate the contribution (weights) of different auxiliary tasks during acoustic model training is far from optimal and actually a matter of heuristic judgment, we thus propose a simple model adaptation method to alleviate such a problem. A series of experiments have been carried out on the Mandarin meeting recording (MMRC) corpora, which seem to reveal the effectiveness of our proposed methods in relation to several existing baselines.

References


Abdel-Hamid, O.,Mohamed, A. R.,Jiang, H.,Deng, L.,Penn, G.,Yu, D.(2014).Convolutional neural networks for speech recognition.IEEE Transactions on Audio, Speech, and Language Processing.22(10),1533-1545.
Bengio, Y.,Lamblin, P.,Popovici, D.,Larochelle, H.(2007).Greedy layer-wise training of deep networks.Advances in neural information processing systems.19,153.
Bengio, Y.,Louradour, J.,Collobert, R.,Weston, J.(2009).Curriculum learning.Proceedings of the International Conference on Machine Learning (ICML).(Proceedings of the International Conference on Machine Learning (ICML)).
Buciluǎ, C.,Caruana, R.,Niculescu-Mizil, A.(2006).Model compression.Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD).(Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD)).
Caruana, R.(1997).Multitask learning.University of Carnegie Mellon.

Read-around