解決常識知識庫中基於句型模板的知識獲取法所遭遇的關係歧義之研究

智慧系統需要常識知識使其更能應付使用者的各種狀況，利用句型模板獲取知識是現在常見的方法之一，已有許多研究能夠透過句型模板從一般大眾身上收集到常識資料。這種方法為了降低收集的難度，在建構時使用了自然語言中的常用詞寫成的句型模板來對應特定的關係知識。但常用詞往往是帶有歧義的，不論是詞本身帶有不同的詞義，或是使用時出現借用、推廣、省略等非正式的用法。在這些情況下可能會收到預期之外的內容，而被系統錯誤地解讀。因此，本研究提出一種方法，藉由詞嵌入的幫助訓練長短期記憶模型分類器，能在不變更原本知識獲取系統的架構下，將原系統無法正確解讀的資料重新分配至正確的關係類別。並以中文概念網為例，本方法在實驗資料集中能將原系統的平均錯誤率從 21.50% 降到 12.66% ，在單一句型上更能將錯誤率從 22.0% 降到 5.0%，有效提升了常識知識庫的品質。

關鍵字

常識知識；知識獲取；關係；句型模板；群眾外包；消歧義；中文概念網

並列摘要

Intelligent systems require commonsense knowledge to make them capable of handling various situations. Template-based knowledge acquisition is one of the most common approaches to commonsense knowledge bases construction. Numerous research have collected commonsense data from the general public through templates. In order to reduce the difficulty of collection, the designers of knowledge acquisition systems are more likely to use the templates written in common words. However, common words are often polysemous or used in informal usages. These factors will cause the systems receive some unexpected content and misinterpreted it. Therefore, this study proposes a method for training relation classifiers by LSTM models with word embedding. The classifiers can re-distribute data that cannot be correctly interpreted by the system to the correct relation without changing the system structure. Taking Chinese ConceptNet as an example, this study shows the proposed method can reduce average error rate from 21.50% to 12.66% in the experimental dataset. For a single template, it at most reduces the error rate from 22.0% to 5.0% . The method effectively improves the quality of the commonsense knowledge base.

並列關鍵字

Commonsense Knowledge ； Knowledge Acquisition ； Relation ； Template ； Crowdsorcing ； Disambiguation ； Chinese ConceptNet

參考文獻

[1] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. Dbpedia: A nucleus for a web of open data. In The semantic web, pages 722–735. Springer, 2007.

Google Scholar

[2] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1247–1250. AcM, 2008.

Google Scholar

[3] F. Bond and R. Foster. Linking and extending an open multilingual wordnet. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 1352–1362, 2013.

Google Scholar

[4] C. Havasi, R. Speer, and J. Alonso. Conceptnet 3: a flexible, multilingual semantic network for common sense knowledge. In Recent advances in natural language processing, pages 27–29. Citeseer, 2007.

Google Scholar

[5] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.

Google Scholar

國際替代計量

解決常識知識庫中基於句型模板的知識獲取法所遭遇的關係歧義之研究

全文下載

主題瀏覽