  • 期刊
  • OpenAccess

Chinese Spelling Check based on Neural Machine Translation


We present a method for Chinese spelling check that automatically learns to correct a sentence with potential spelling errors. In our approach, a character-based neural machine translation (NMT) model is trained to translate the potentially misspelled sentence into correct one, using right-and-wrong sentence pairs from newspaper edit logs and artificially generated data. The method involves extracting sentences contain edit of spelling correction from edit logs, using commonly confused right-and-wrong word pairs to generate artificial right-and-wrong sentence pairs in order to expand our training data , and training the NMT model. The evaluation on the United Daily News (UDN) Edit Logs and SIGHAN-7 Shared Task shows that adding artificial error data can significantly improve the performance of Chinese spelling check system.


Felice, M. & Yuan, Z. (2014). Generating artificial errors for grammatical error correction. In Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics, 116-126. doi: 10.3115/v1/E14-3013
Gu, S. & Lang, F. (2017). A chinese text corrector based on seq2seq model. In Proceedings of 2017 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), 322-325. doi: 10.1109/CyberC.2017.82
Liu, C.-L., Lai, M.-H., Tien, K.-W., Chuang, Y.-H., Wu, S.-H., & Lee, C.-Y. (2011). Visually and phonologically similar characters in incorrect chinese words: Analyses, identification, and applications. ACM Transactions on Asian Language Information Processing (TALIP), 10(2),10. doi: 10.1145/1967293.1967297
Ma, W.-Y. & Chen, K.-J. (2003). Introduction to ckip chinese word segmentation system for the first international chinese word segmentation bakeoff. In Proceedings of the 2nd SIGHAN on CLP, 168-171. doi: 10.3115/1119250.1119276
Tseng, Y.-H., Lee, L.-H., Chang, L.-P., & Chen, H.-H. (2015). Introduction to sighan 2015 bake-off for chinese spelling check. In Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing, 32-37. doi: 10.18653/v1/W15-3106
