類神經機器翻譯為本的中文拼字改錯系統

本論文提出一個中文拼字改錯的方法，自動學習改正一個句子中潛在的拼字錯誤。我們應用類神經機器翻譯模型(Neural Machine Translation, NMT)於中文拼字改錯，亦即將一句可能有拼字錯誤的句子翻譯為正確的句子。我們使用從新聞改稿紀錄和人造錯誤資料中提取的對與錯的句對來訓練一個NMT拼字改錯模型。在訓練階段，我們首先從新聞改稿紀錄抽取與拼字錯誤修改有關的句子。為了擴充訓練資料，我們使用勘誤表(Confusion Set)來生成具有拼字錯誤的句子，接著用這些資料來訓練模型。實驗結果顯示，改稿紀錄加上人造錯誤資料所訓練的模型有較好的效能。

關鍵字

中文拼字改錯；生成人造錯誤；類神經機器翻譯；改稿紀錄；編輯紀錄

並列摘要

We present a method for Chinese spelling check that automatically learns to correct a sentence with potential spelling errors. In our approach, a character-based neural machine translation (NMT) model is trained to translate the potentially misspelled sentence into correct one, using right-and-wrong sentence pairs from newspaper edit logs and artificially generated data. The method involves extracting sentences contain edit of spelling correction from edit logs, using commonly confused right-and-wrong word pairs to generate artificial right-and-wrong sentence pairs in order to expand our training data , and training the NMT model. The evaluation on the United Daily News (UDN) Edit Logs and SIGHAN-7 Shared Task shows that adding artificial error data can significantly improve the performance of Chinese spelling check system.

並列關鍵字

Chinese Spelling Check ； Chinese Error Correction ； Artificial Error Generation ； Neural Machine Translation ； Edit Log

參考文獻

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.

Google Scholar

Chao-Huang Chang. A new approach for automatic chinese spelling correction. In Proceedings of Natural Language Processing Pacific Rim Symposium, volume 95, pages 278–283. Citeseer, 1995.

Google Scholar

Hsun-wen Chiu, Jian-cheng Wu, and Jason S Chang. Chinese spelling checker based on statistical machine translation. In Proceedings of the Seventh SIGHAN Workshop on Chinese Language Processing, pages 49–53, 2013.

Google Scholar

Shamil Chollampatt and Hwee Tou Ng. A multilayer convolutional encoder- decoder neural network for grammatical error correction. arXiv preprint arXiv:1801.08831, 2018.

Google Scholar

Mariano Felice and Zheng Yuan. Generating artificial errors for grammatical error correction. In Proceedings of the Student Research Workshop at the 14th Confer- ence of the European Chapter of the Association for Computational Linguistics, pages 116–126, 2014.

Google Scholar

國際替代計量

類神經機器翻譯為本的中文拼字改錯系統

全文下載

主題瀏覽