統計式與類神經機器翻譯應用於英文文法改錯

本研究探討利用混合式模型來解決文法改錯問題。我們實作了統計式翻譯和類神經翻譯改錯模型，並開發一系列的混合方法來整合兩者。開發統計式翻譯的過程涉及預處理已標註的學習者語料、訓練語言模型、建立錯誤翻譯模型，並利用解碼器產生正確句子。接著我們處理原始的標記資料，轉換為平行的改正前、改正後的句子配對，並以此訓練類神經改錯模型。我們利用重新計分、投票、Pipeline 等方法來整合統計式、類神經模型。公開資料集的實驗顯示，我們的混合模型有效的利用統計式和類神經模型的優勢，並達到最佳的效果。最後，我們探討實驗的結果，也指出文法改錯研究所面臨的挑戰。

關鍵字

自動文法改錯；機器翻譯；混合系統

並列摘要

The paper investigates hybrid approaches to solving grammatical error correction (GEC) problems. In our approach, we develop statistical machine translation (SMT) and neural machine translation (NMT) models, and build a series of hybrid systems incorporating them. The SMT method involves preprocessing annotated learner corpora, constructing a translation model, training a language model, and finally generating correction with a decoder. Annotated sentences are then converted into parallel sentence pairs to train NMT models. We use re-scoring, voting, and pipeline techniques to integrate SMT and NMT models. Experiments on public testsets indicate that our hybrid systems effectively exploit the strength of both SMT and NMT models and achieve the best performance. Finally, we discuss the result and address the challenges facing in the GEC field.

並列關鍵字

Automatic Grammatical Error Correction ； Machine Translation ； Hybrid System

參考文獻

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.

Google Scholar

Chris Brockett, William B Dolan, and Michael Gamon. Correcting esl errors using phrasal smt techniques. In Proceedings of the 2006 ACL, pages 249–256. Association for Computational Linguistics, 2006.

Google Scholar

Christopher Bryant, Mariano Felice, and Ted Briscoe. Automatic annotation and eval- uation of error types for grammatical error correction. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 793–805, 2017.

Google Scholar

Stanley F Chen and Joshua Goodman. An empirical study of smoothing techniques for language modeling. Computer Speech & Language, 13(4):359–394, 1999.

Google Scholar

Martin Chodorow, Michael Gamon, and Joel Tetreault. The utility of article and prepo- sition error correction systems for english language learners: Feedback and assessment. Language Testing, 2010.

Google Scholar

國際替代計量

統計式與類神經機器翻譯應用於英文文法改錯

全文下載

主題瀏覽