智慧系統需要常識知識使其更能應付使用者的各種狀況,利用句型模板獲取知識是現在常見的方法之一,已有許多研究能夠透過句型模板從一般大眾身上收集到常識資料。這種方法為了降低收集的難度,在建構時使用了自然語言中的常用詞寫成的句型模板來對應特定的關係知識。但常用詞往往是帶有歧義的,不論是詞本身帶有不同的詞義,或是使用時出現借用、推廣、省略等非正式的用法。在這些情況下可能會收到預期之外的內容,而被系統錯誤地解讀。因此,本研究提出一種方法,藉由詞嵌入的幫助訓練長短期記憶模型分類器,能在不變更原本知識獲取系統的架構下,將原系統無法正確解讀的資料重新分配至正確的關係類別。並以中文概念網為例,本方法在實驗資料集中能將原系統的平均錯誤率從 21.50% 降到 12.66% ,在單一句型上更能將錯誤率從 22.0% 降到 5.0%,有效提升了常識知識庫的品質。
Intelligent systems require commonsense knowledge to make them capable of handling various situations. Template-based knowledge acquisition is one of the most common approaches to commonsense knowledge bases construction. Numerous research have collected commonsense data from the general public through templates. In order to reduce the difficulty of collection, the designers of knowledge acquisition systems are more likely to use the templates written in common words. However, common words are often polysemous or used in informal usages. These factors will cause the systems receive some unexpected content and misinterpreted it. Therefore, this study proposes a method for training relation classifiers by LSTM models with word embedding. The classifiers can re-distribute data that cannot be correctly interpreted by the system to the correct relation without changing the system structure. Taking Chinese ConceptNet as an example, this study shows the proposed method can reduce average error rate from 21.50% to 12.66% in the experimental dataset. For a single template, it at most reduces the error rate from 22.0% to 5.0% . The method effectively improves the quality of the commonsense knowledge base.