透過您的圖書館登入
IP:3.129.21.166
  • 學位論文

使用深層類神經網路之中文語音辨認

A Study on DNN-based Mandarin-speech Recognition

指導教授 : 陳信宏

摘要


目前深層類神經網路已成為語音辨識領域中的熱門研究,本論文中以Kaldi speech recognition toolkit建立即時中文大詞彙語音辨識系統,並使用深層類神經網路取代傳統聲學模型中高斯混合模型,以加權有限狀態機實現辨識系統,分析影響辨識率與辨識時間的因素。其中聲學模型部分使用高斯混合模型或深層類神經網路模型的差異,以及語言模型大小的差異都影響了整個辨識系統的大小,解碼過程中的許多參數也影響了辨識系統的效能,因此藉由調整這些參數,我們可以找到最佳的操作點,以得到一個即時又正確的辨識系統。在實驗中使用TCC300語料庫分別作為訓練與測試語料,並建立八萬詞的發音詞典。

並列摘要


Deep neural network has been a popular research area in automatic speech recognition. In this dissertation, we focus on implementing a real-time large vocabulary mandarin-speech recognition system using Kaldi speech recognition toolkit. In the proposed system we develop the deep neural network acoustic model which is compared to the conventional Gaussian mixture model (GMM). We also use weighted finite state transducer (WFST) to realize the decoder. The main goal is to receive the best operation point in the proposed speech recognition system by tuning the parameter which effect the recognition speed amd recognition rate. The experimental results use the speech corpora of TCC300 speech corpora and 80k lexicon.

參考文獻


[1] L. R. Bahl, F. Jelinek and R. L. Mercer "A maximum likelihood approach to continuous speech recognition", IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-5, pp.179 -190, 1983.
[3] S. B. Davis, and P. Mermelstein, "Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences." IEEE Transactions on Acoustics, Speech, and Signal Processing 28(4); pp.357-366, 1980.
[4] S. Tiberewala and H. Hermansky, “Multiband and adaptation approaches to robust speech recognition,” in Proceedings of European Conference on Speech Communication and Technology, 25(1-3), pp. 2619-2622, 1997.
[6] M. Gales, "Semi-tied covariance matrices for hidden Markov models", IEEE Trans. Speech Audio Process., vol. 7, no. 3, pp.272 -281, 1999.
[7] X. Zhang, J. Trmal, D. Povey, and S. Khudanpur, “Improving deep neural network acoustic models using generalized maxout networks,” in Proceedings of IEEE ICASSP, 2014.

延伸閱讀