  • 期刊
  • OpenAccess

Integrating Dictionary and Web N-grams for Chinese Spell Checking


Chinese spell checking is an important component of many NLP applications, including word processors, search engines, and automatic essay rating. Nevertheless, compared to spell checkers for alphabetical languages (e.g., English or French), Chinese spell checkers are more difficult to develop because there are no word boundaries in the Chinese writing system and errors may be caused by various Chinese input methods. In this paper, we propose a novel method for detecting and correcting Chinese typographical errors. Our approach involves word segmentation, detection rules, and phrase-based machine translation. The error detection module detects errors by segmenting words and checking word and phrase frequency based on compiled and Web corpora. The phonological or morphological typographical errors found then are corrected by running a decoder based on the statistical machine translation model (SMT). The results show that the proposed system achieves significantly better accuracy in error detection and more satisfactory performance in error correction than the state-of-the-art systems.


Chang, C.-H.(1995).A new approach for automatic Chinese spelling correction.Proceedings of Natural Language Processing Pacific Rim Symposium.(Proceedings of Natural Language Processing Pacific Rim Symposium).
Chen, Y.-Z.(2010).Improve the detection of improperly used Chinese characters with noisy channel model and detection template.Chaoyang University of Technology.
Huang, C.-R.,Chen, K.-j.,Chang, L.-L.(1996).Segmentation standard for Chinese natural language processing.Proceedings of the 1996 International Conference on Computational Linguistics (COLING 96).(Proceedings of the 1996 International Conference on Computational Linguistics (COLING 96)).
Huang, C.-M.,Wu, M.-C.,Chang, C.-C.(2007).Error detection and correction based on Chinese phonemic alphabet in Chinese text.Proceedings of the 4th International Conference on Modeling Decisions for Artificial Intelligence (MDAI IV).(Proceedings of the 4th International Conference on Modeling Decisions for Artificial Intelligence (MDAI IV)).
Hung, T.-H.(2009).Automatic Chinese character error detecting system based on n-gram language model and pragmatics knowledge base.Chaoyang University of Technology.
