Title

以正規邏輯方法解決中文文本蘊含辨識問題

Translated Titles

A Formal Logic Approach to Chinese Recognizing Textual Entailment

DOI

10.6342/NTU.2014.01460

Authors

張富傑

Key Words

形式語意學 ; 計算語意學 ; 自然語言理解 ; 一階邏輯 ; 中文文本蘊含辨識 ; Formal Semantics ; Computational Semantics ; Natural Language Understanding ; First Order Logic ; Chinese Recognizing Textual Entailment

PublicationName

臺灣大學電子工程學研究所學位論文

Volume or Term/Year and Month of Publication

2014年

Academic Degree Category

碩士

Advisor

黃鐘揚

Content Language

英文

Chinese Abstract

在自然語言處理的應用中,理解自然語言,一直是個很有挑戰的問題。傳統的自然語言處理研究,著重在理解語言的語意與邏輯。而目前自然語言處理的的研究方向,則是著重在用巨量資料和機器學習的方式。雖然這兩種方法各有優缺點,但現今在自然語言處理的研究,傳統的語意學模型則極少被拿出來討論。而目前的機器學習方法,也有其解決問題的極限。若能整合傳統的語意學,和機器學習的方法,是一個值得研究的方向。 我們建構一個系統可以用正規邏輯方法解決中文文本蘊含辨識問題。基於形式語意學和計算語意學的理論,我們先用機器學習的方式,將中文文句轉成剖析樹,再用我們提出的演算法,把剖析樹轉成語意表達式。並且,我們提出可以整合外部的知識和語意表達式的方法,並用定理證明的方式,解決中文文本蘊含辨識的問題。再來我們示範,我們的系統可以解決句型較簡單的問題。以及解決現實世界應用問題的可能性與挑戰。最後,我們得出這個系統的優缺點,以及未來可行的研究方向,來改進此系統。

English Abstract

In the research of natural language processing (NLP), understanding the natural language is always a challenging problem. Traditionally, the research of NLP focuses on the semantics and logic of natural language. However, the present NLP research trend is focusing on the big data and machine learning techniques. These two methods have their own pros and cons; however, the traditional research of semantics and logic are seldom discussed in the recent works, and the existing machine learning techniques also have their limitations. Combining the traditional works on semantics with machine learning techniques is a good perspective to research. We build a system to solve the Chinese recognizing textual entailment (RTE) problem by formal logic method. Based on the theory of formal semantics and computational semantics, first, we use the machine learning technique to convert Chinese sentences in natural language into syntax trees. Then, we propose an algorithm to convert the syntax trees into semantic representations. Also, we propose a method that solves the RTE problem by integrating external knowledge resources with the proposed semantic representations. With these semantic representations, we can use the theorem proving techniques to solve the problem of Chinese RTE. Then, we demonstrate that our approach can solve some simple cases of Chinese RTE. Also, we show the possibilities and difficulties to solve the real-world cases. Finally, we point out the strengths and weaknesses of our system, and the possibilities on future research to improve our system.

Topic Category 電機資訊學院 > 電子工程學研究所
工程學 > 電機工程
Reference
  1. [2] Ekaterina Ovchinnikova. Integration of World Knowledge for Natural Language Understanding. Atlantis Thinking Machines. Atlantis Press, 2012.
    連結:
  2. [3] Donald Davidson. The individuation of events. In N. Resher, editor, Essays in Honor of Carl G. Hempel, page 216 – 234. Springer, 1969.
    連結:
  3. [4] David R. Dowty. On the semantic content of the notion of ’thematic role’. In Raymond Turner Gennaro Chierchia, Barbara H. Partee, editor, Properties, Types and Meaning, pages 69–129. 1989.
    連結:
  4. [5] Terence Parsons. Events in the Semantics of English: A Study in Subatomic Semantics. MIT Press, 1990.
    連結:
  5. [6] Patrick Blackburn and Johan Bos. Representation and Inference for Natural Language. A First Course in Computational Semantics. CSLI, 2005.
    連結:
  6. [9] S. R. Safavian and D. Landgrebe. A survey of decision tree classifier methodology. Systems, Man and Cybernetics, IEEE Transactions on, 21(3):660–674, 1991.
    連結:
  7. [12] Christiane Fellbaum. WordNet: An Electronic Lexical Database. MIT Press, 1998.
    連結:
  8. [14] Thomas R. Gruber. A translation approach to portable ontology specifications. Knowl. Acquis., 5(2), June 1993.
    連結:
  9. [16] Herbert Rubenstein and John B. Goodenough. Contextual correlates of synonymy. Commun. ACM, 8(10):627–633, 1965.
    連結:
  10. [17] T.K. Landauer, P.W. Foltz, and D. Laham. An introduction to latent semantic analysis. Discourse processes, 25:259–284, 1998.
    連結:
  11. [19] Johan Bos. A survey of computational semantics: Representation, inference and knowledge in wide-coverage text understanding. Language and Linguistics Compass, 2011.
    連結:
  12. [20] Richard Montague. The proper treatment of quantification in ordinary english. In Richmond H. Thomason, editor, Formal Philosophy, page 247 – 270. 1973.
    連結:
  13. [21] Carl Pollard Ann Copestake, Dan Flickinger and Ivan A. Sag. Minimal recursion semantics: An introduction. 3(2-3):281 – 332, 2005.
    連結:
  14. [27] Martha Palmer, Paul Kingsbury, and Daniel Gildea. The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, 31, 2005.
    連結:
  15. [28] Stephen Clark and James R. Curra. Wide-coverage efficient statistical parsing with ccg and log-linear models. 33(4):493–552, 2007.
    連結:
  16. [31] Ronald M. Kaplan and Joan Bresnan. Lexical-functional grammar: A formal system for grammatical representation, 1995.
    連結:
  17. [34] YI Zhang and Dong Mo Zhang. Enabling answer validation by logic form reasoning in chinese question answering. In Proceeding of 2003 International Conference on Natural Language Processing and Knowledge Engineering, pages 275– 280, 2003.
    連結:
  18. [35] Dong Zhendong and Dong Qiang. HowNet, 1999, cited May 2014. URL http: //www.keenage.com/.
    連結:
  19. [36] Szu-Hua Chen and Jiun-Shiung Wu. Toward a computational semantic grammar for mandarin chinese: A sinica corpus-based study. Master’s thesis, Graduate Institute of Linguistics, National Chung Cheng University, 2013.
    連結:
  20. [38] Jon Barwise and Robin Cooper. Generalized quantifiers and natural language. Linguistics and Philosophy, 4(2):159–219, 1981.
    連結:
  21. [41] Andreas Wotzlaw and Ravi Coote. A logic-based approach for recognizing textual entailment supported by ontological background knowledge. CoRR, abs/1310.4938, 2013.
    連結:
  22. [44] Fabio Massimo Zanzotto, Marco Pennacchiotti, and Alessandro Moschitti. A machine learning approach to textual entailment recognition. Nat. Lang. Eng., 15(4): 551–582, 2009.
    連結:
  23. [45] Hen-Hsen Huang, Kai-Chun Chang, and Hsin-Hsi Chen. Modeling human inference process for textual entailment recognition. In ACL, pages 446–450. The Association for Computer Linguistics, 2013.
    連結:
  24. [46] Johan Bos and Katja Markert. Recognizing textual entailment with logical inference. In EMNLP-05, pages 628–635, 2005.
    連結:
  25. [47] Rajat Raina, Andrew Y. Ng, and Christopher D. Manning. Robust textual inference via learning and abductive reasoning. In Proc. of AAAI 2005, pages 1099–1105, 2005.
    連結:
  26. [49] J.R. Hobbs, M.E. Stickel, D.E. Appelt, and P. Martin. Interpretation as abduction. Artificial Intelligence, 63:69–142, 1993.
    連結:
  27. [50] Hideki Shima, Hiroshi Kanayama, Cheng-Wei Lee, Chuan-Jie Lin, Teruko Mitamura, Yusuke Miyao, Shuming Shi, and Koichi Takeda. Overview of ntcir-9 rite: Recognizing inference in text, 2011.
    連結:
  28. [51] Yotaro Watanabe, Yusuke Miyao, Junta Mizuno, Tomohide Shibata, Hiroshi Kanayama, Cheng-Wei Lee, Chuan-Jie Lin, Shuming Shi, Teruko Mitamura, Noriko Kando, Hideki Shima, and Kohichi Takeda. Overview of the recognizing inference in text (rite-2) at ntcir-10, 2013.
    連結:
  29. [52] Hen-Hsen Huang, Kai-Chun Chang, James M.C. Haver II, and Hsin-Hsi Chen. Ntu textual entailment system for ntcir 9 rite task, 2011.
    連結:
  30. [53] Shan-Shun Yang, Shih-Hung Wu, Liang-Pu Chen, Hung-Sheng Chiu, and Ren-Dar Yang. Entailment analysis for improving chinese recognizing textual entailment system. In ROCLING, 2013.
    連結:
  31. [58] Hua wei Ke, Ming lei Chen, and Xue cheng Wang. Chinese latent semantic analysis website, 2009, cited 2014. URL http://www.lsa.url.tw/modules/lsa/.
    連結:
  32. [61] Roberto Navigli. Word sense disambiguation: a survey. ACM COMPUTING SURVEYS, 41(2):1–69, 2009.
    連結:
  33. [62] Hans Kamp. Discourse representation theory. In J. Verschueren, J.-O. Ostman, and J. Blommaert, editors, Handbook of Pragmatics, pages 253–257. Benjamins, 1995.
    連結:
  34. [1] James Allen. Natural Language Understanding. Benjamin Cummings, Menlo Park,CA, 1995.
  35. [7] Ido Dagan, Oren Glickman, and Bernardo Magnini. The pascal recognizing textual entailment challenge. In Proceedings of the PASCAL Challenges Workshop on Recognizing Textual Entailment, 2005.
  36. [8] Marta Tatu and Dan Moldovan. A logic-based semantic approach to recognizing textual entailment. In Proceedings of the COLING/ACL on Main Conference Poster Sessions, COLING-ACL ’06, pages 819–826, Stroudsburg, PA, USA, 2006. Association for Computational Linguistics.
  37. [10] Corinna Cortes and Vladimir Vapnik. Support-vector networks. In Machine Learning, pages 273–297, 1995.
  38. [11] Steven Bird, Ewan Klein, and Edward Loper. Supervised classification. In Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit, pages 221–231. O’Reilly, Beijing, 2009.
  39. [13] Karin Kipper, Anna Korhonen, Neville Ryant, and Martha Palmer. A large-scale classification of english verbs. Language Resources and Evaluation, 2007.
  40. [15] Franz Baader, Diego Calvanese, Deborah L. McGuinness, Daniele Nardi, and Peter F. Patel-Schneider, editors. The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press, New York, NY, USA, 2003.
  41. [18] Kevin Lund, Curt Burgess, and Ruth A. Atchley. Semantic and associative priming in high-dimensional semantic space. In Proceedings of the 17th Annual Conference of the Cognitive Science Society, pages 660–665. Hillsdale, NJ: Erlbaum, 1995.
  42. [22] Josef van Genabith, Anette Frank, and Dick Crouch. Glue, underspecification and translation. page 265 – 279, 1999.
  43. [23] Christof Monz and Maarten de Rijke. Light-weight entailment checking for computational semantics. 2001.
  44. [24] Allan Ramsay and Helen Seville. Models and discourse models. 1(2):167 – 181, 2000.
  45. [25] Dan Moldovan and Vasile Rus. Explaining answers with extended wordnet. In ACL, 2001.
  46. [26] Collin F. Baker, Charles J. Fillmore, and John B. Lowe. The berkeley framenet project. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 1, ACL ’98, pages 86–90, Stroudsburg, PA, USA, 1998. Association for Computational Linguistics.
  47. [29] Mark Steedman. The Syntactic Process. MIT Press, 2000.
  48. [30] Dick Crouch and Tracy Holloway King. Semantics via f-structure rewriting. In Proceedings of the LFG06 Conference, 2006.
  49. [32] Ann Copestake and Dan Flickinger. An open source grammar development environment and broad-coverage english grammar using hpsg. In Proceedings of LREC 2000, pages 591–600, 2000.
  50. [33] Carl Pollard and Ivan Sag. Head-Driven Phrase Structure Grammar. Studies in Contemporary Linguistics. University of Chicago Press, 1994.
  51. [37] Li-Ping Chang Chen Keh-Jiann, Chu-Ren Huang and Hui-Li Hsu. Sinica corpus: Design methodology for balanced corpra. In PACLIC, pages 167–176, 1996.
  52. [39] The Fracas Consortium, Robin Cooper, Dick Crouch, Jan Van Eijck, Chris Fox, Josef Van Genabith, Jan Jaspars, Hans Kamp, David Milward, Manfred Pinkal, Massimo Poesio, Steve Pulman, Ted Briscoe, Holger Maier, and Karsten Konrad. Using the framework, 1996.
  53. [40] Ion Androutsopoulos and Prodromos Malakasiotis. A survey of paraphrasing and textual entailment methods. CoRR, abs/0912.3747, 2009.
  54. [42] Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. Yago: A core of semantic knowledge. In Proceedings of the 16th International Conference on World Wide Web, WWW ’07, pages 697–706, New York, NY, USA, 2007. ACM.
  55. [43] Cynthia Matuszek, John Cabral, Michael Witbrock, and John Deoliveira. An introduction to the syntax and content of cyc. In Proceedings of the 2006 AAAI Spring Symposium on Formalizing and Compiling Background Knowledge and Its Applications to Knowledge Representation and Question Answering, pages 44–49, 2006.
  56. [48] Dan Klein and Christopher D. Manning. Accurate unlexicalized parsing. In IN PROCEEDINGS OF THE 41ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, pages 423–430, 2003.
  57. [54] Chinese Knowledge Information Processing Group. CKIP Chinese Parser, cited May 2014. URL http://ckip.iis.sinica.edu.tw/CKIP/parser.htm.
  58. [55] Chinese Knowledge Information Processing Group. Technical report no. 93-05 : 中文詞類分析 (三版). Technical report, Institute of Information Science, Academia Sinica, 1993.
  59. [56] Chinese Knowledge Information Processing Group. Technical report no. 13-01 : 句結構樹中的語意角色. Technical report, Institute of Information Science, Academia Sinica, 2013.
  60. [57] Chu-Ren Huang and Shu-Kai Hsieh. Infrastructure for Cross-lingual Knowledge Representation ─Towards Multilingualism in Linguistic Studies. Taiwan NSC- granted Research Project (NSC 96-2411-H-003-061-MY3) , 2010, cited May 2014. URL http://lope.linguistics.ntu.edu.tw/cwn/.
  61. [59] Steven Bird. Nltk: The natural language toolkit. In Proceedings of the COLING/ACL on Interactive presentation sessions, COLING-ACL ’06, pages 69–72, Stroudsburg, PA, USA, 2006. Association for Computational Linguistics.
  62. [60] W. McCune. Prover9 and mace4. http://www.cs.unm.edu/~mccune/prover9/, 2005–2010.