蘊涵分析於改進中文文字蘊涵識別系統

文字蘊涵是自然語言處理最近興起的研究課題。文字蘊涵識別(Recognizing Textual Entailment, RTE)可以應用到其他許多自然語言處理的研究中。在本文中將介紹我們在觀察NTCIR-10-RITE-2資料集後發現過去系統的缺陷，進而提出如何改進中文文字蘊涵系統的方法。過去的系統處理文字蘊涵都使用機器學習分類文題的方法，所有輸入句子都用同樣的分類器處理，對於某些特別的問題往往會產生誤判。我們認為應該針對於特定類型的問題做處理，增加系統可以處理的問題類型。實驗結果顯示配合之前提出的機器學習方法，增加四種特殊類型分類對特殊類型句子進行個別處理，可以有效改進系統，實驗結果系統在識別簡體中文蘊涵兩類的正確率從原本67.86%提昇到72.62%。

關鍵字

蘊涵分析；中文文字蘊涵識別

並列摘要

Recognizing Textual Entailment (RTE) is a new research issue in natural language processing (NLP) research area. RTE can be a useful component in many NLP applications. In this paper, we introduce our finding on the entailment analysis of the NTCIR-10 RITE-2 dataset, and use the observation to improve our system. In the previous works, all the input pairs are treated equally in a standard classification architecture. We find that is not suitable for some special cases. We believe that by isolating the special cases and building separated classifiers, a RTE system can perform better. After implementing modules for four special cases into our system, the result is significantly improved from 67.86% to 72.62% on the binary class classification task.

並列關鍵字

Entailment Analysis ； Chinese Recognizing Textual Entailment

參考文獻

[15]Min-Hsiang Li, Shih-Hung Wu, Yi-Ching Zeng, Ping-che Yang, and Tsun Ku, Chinese Characters Conversion System based on Lookup Table and Language Model, Computational Linguistics and Chinese Language Processing, Vol. 15, No. 1, March 2010, pp. 19-36.

[1]Ido Dagan, Oren Glickman and Bernardo Magnini.The PASCAL Recognising Textual Entailment Challenge.In Quinonero-Candela, J.; Dagan, I.; Magnini, B.; d''Alche-Buc, F. (Eds.) Machine Learning Challenges. Lecture Notes in Computer Science , Vol. 3944, pp. 177-190, Springer, 2006.

[3]Yongping Ou, Changqing Yao, “Recognize Textual Entailment by the Lexical and Semantic Matching”, Computer Application and System Modeling, 2010 International Conference on V2-500 -504

[4]Dong-Bin Hua, Jun Dinga,” Study on Similar Engineering Decision Problem Identification Based on Combination of Improved Edit-Distance and Skeletal Dependency Tree with POS”, Systems Engineering Procedia Volume 1, 2011, Pages 406–413.

[5]Shih-Hung Wu, Wan-Chi Huang, Liang-Pu Chen and Tsun Ku. Binary-class and Multi-class Chinese Textural Entailment System Description in NTCIR-9 RITE, in Proceedings of the NTCIR-9 workshop, Tokyo, Japan, 6-10 Dec., 2011.

國際替代計量

蘊涵分析於改進中文文字蘊涵識別系統

主題瀏覽