透過您的圖書館登入
IP:13.59.154.190
  • 學位論文

使用自注意力機制的端到端音轉字語言模型

End-to-end Pinyin to Character Language Model using Self-Attention Mechanism

指導教授 : 陳信宏

摘要


隨著深度學習技術的蓬勃發展,在傳統語音辨認的管線化架構下運用類神經網路已獲得相當的成功。而這兩年端到端(End-to-end)的語音辨認架構也取得與前者可匹敵的成果,但是需要非常龐大的訓練語料和運算資源。本研究嘗試從端到端語言模型的角度切入,利用實驗室所累計4.4 億詞的文字語料經過文字轉音節系統作為訓練語料,以自然語言處理中常用的序列標註模型和具自注意力機制的序列到序列模型(Transformer)訓練音轉字語言模型,發現音節序列中也具語意資訊,利用深度神經網路有助於將音節序列轉為正確的字(詞)序列。其中Transformer 的音轉字模型在外部測試集中達到比作為基線的Trigram 模型更低的錯誤率。

並列摘要


Deep nerual network with conventional automatic speech recognition structure has achieved huge improvement. Similarly, end-to-end speech recongnition structure got close performance in these two years, but with huge amout of data and computing resources. This study attempt to focus on end-to-end language model, training an end-to-end language model by sequence labeling method and self-attention seq2seq model (Transformer) which are common method in some NLP task, with syllable sequence converted from 440 million words chinese corpus through a proposed G2P system. And the syllable to character model with transformer achieved lower character error rate than the baseline trigram model in our outside test set.

並列關鍵字

Language model P2G G2P Attention

參考文獻


[1] O. Abdel-Hamid, A.-r. Mohamed, H. Jiang, L. Deng, G. Penn, and D. Yu, “Convolutional
neural networks for speech recognition,” IEEE/ACM Transactions on audio, speech, and
language processing, vol. 22, no. 10, pp. 1533–1545, 2014.
[2] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke,
P. Nguyen, T. N. Sainath et al., “Deep neural networks for acoustic modeling in speech

延伸閱讀