使用語意詞彙網路及語法相依性分析於中文文本蘊涵關係之研究

文本蘊涵辨識(RITE)是一個效能評鑑任務，目的在評鑑系統自動偵測語句之間 "推論關係" 的能力，例如蘊涵(entailment)、意譯(paraphrase)、矛盾(contradiction)等。本研究提出加入語意詞彙網路(WordNet)及語法相依性分析(Dependency syntactic analysis)之特徵方法用以處理NTCIR-10 RITE-2子任務之文本蘊涵辨識。語意詞彙網路通常用於辨識詞彙程度的蘊涵關係，語法相依性方法是一種將兩文本進行相依樹之轉換並計算兩子樹之編輯距離(Edit Distance)。本研究實驗結果顯示，利用我們系統所加入之語意特徵為基礎，並利用機器學習進行特徵的分類，使用特徵選取的方法得到最佳化的特徵組合，在NTCIR-10 RITE-2之中文文本蘊涵辨識的整體準確率在繁體BC子任務中達到73.28%，在簡體BC子任務中達到74.57% ，本研究的主要貢獻為，我們於實驗中加入語意特徵方法對中文文本蘊涵辨識之準確率有大幅提升之效果。

關鍵字

文本蘊涵；語意特徵；相依性分析； WordNet ；語法特徵；機器學習；支持向量機(SVM)

並列摘要

Recognizing Inference in TExt (RITE) is a task for automatically detecting entailment, paraphrase, and contradiction in texts which addressing major text understanding in information access research areas. In this paper, we proposed a Chinese textual entailment system using Wordnet semantic and dependency syntactic approaches in Recognizing Inference in Text (RITE) using the NTCIR-10 RITE-2 subtask datasets. Wordnet is used to recognize entailment at lexical level. Dependency syntactic approach is a tree edit distance algorithm applied on the dependency trees of both the text and the hypothesis. We thoroughly evaluate our approach using NTCIR-10 RITE-2 subtask datasets. As a result, our system achieved 73.28% on Traditional Chinese Binary-Class (BC) subtask and 74.57% on Simplified Chinese Binary-Class subtask with NTCIR-10 RITE-2 development datasets. Thorough experiments with the text fragments provided by the NTCIR-10 RITE-2 subtask showed that the proposed approach can improve system's overall accuracy.

並列關鍵字

Textual Entailment ； Semantic Features ； Dependency Analysis ； WordNet ； Syntactic Features ； Machine Learning ； Support Vector Machine (SVM)

參考文獻

[5]Burchardt A., Reiter N., Thater S., and Frank A., "A semantic approach to textual entailment： System evaluation and task analysis.," Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, pp. 10-15, 2007.

[6]Castillo J., J., "A Machine Learning Approach for Recognizing Textual Entailment in Spainish," 2010.

[7]Chang C.-C. and Lin C.-J., "LIBSVM： A library for support vector machines," ACM Trans. Intell. Syst. Technol vol. 2, pp. 1-27, 2011.

[9]Giampiccolo D., B. M., Dagan I., and Dolan B., "The third pascal recognizing textual entailment challenge," presented at the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, 2007.

[11]Hirschberg D. S., "Algorithms for the Longest Common Subsequence Problem," Journal of the Assocrauon for Computing Machinery, vol. 24:4, pp. 664-675, 1997.

國際替代計量

使用語意詞彙網路及語法相依性分析於中文文本蘊涵關係之研究

全文下載

主題瀏覽