透過您的圖書館登入
IP:3.142.142.113
  • 期刊
  • OpenAccess

Exploiting Pinyin Constraints in Pinyin-to-Character Conversion Task: A Class-Based Maximum Entropy Markov Model Approach

並列摘要


The Pinyin-to-Character Conversion task is the core process of the Chinese pinyin-based input method. Statistical language model techniques, especially ngram-based models, are mostly adopted to solve that task. However, the ngram model only focuses on the constraints between characters, ignoring the pinyin constraints in the input pinyin sequence. This paper improves the performance of the Pinyin-to-Character Conversion system through exploitation of the pinyin constraints. The MEMM framework is used to describe the pinyin constraints and the character constraints. A Class-based MEMM (C-MEMM) model is proposed to address the MEMM efficiency problem in the Pinyin-to-Character Conversion task. The C-MEMM probability functions are strictly deduced and well formulized according to the Bayes rule and the Markov property. Both the cases of hard class and soft class are well discussed. In the experiments, C-MEMM outperforms the traditional ngram model significantly by exploitation of the pinyin constraints in the Pinyin-to-Character Conversion task. In addition, C-MEMM can well utilize the syntax and semantic information in word class and further improve the system performance.

參考文獻


Berger, A,S. D. Pietra,V. D. Pietra(1996).A maximum entropy approach to natural language processing.Computational Linguistics.22(1),39-71.
Brown, P. F.,V. J. D. Pietra,P. V. deSouza,J. C. Lai,R. L. Mercer(1992).Class-based n-gram models of natural language.Computational Linguistics.18(4),467-479.
Chen, Y.(1997).Chinese Language Processing.(Shang Hai education publishing company).
Chen, L. Z.,T. Y. Huang(1999).A Novel Word Clustering Algorithm And Van-gram Language Model.Journal of Computer Sciences.22(9),942-948.
Chen, Z,K. F. Lee(2000).A New Statistical Approach To Chinese Pinyin Input.Proceedings of the 38th Annual Meeting on Association for Computational Linguistics (ACL2000).(Proceedings of the 38th Annual Meeting on Association for Computational Linguistics (ACL2000)).:

被引用紀錄


Jiang, T. J. (2012). Syllable Word Segmentation for Mandarin Chinese via Double Ranking of the Left and Right Context [doctoral dissertation, National Tsing Hua University]. Airiti Library. https://doi.org/10.6843/NTHU.2012.00487

延伸閱讀