知識圖譜驗證,或者稱作知識圖譜清理,是一項辨識圖譜中的實體間關係是否正確的任務,除了能夠有效地改善圖譜的品質之外,經過清理後的知識圖譜亦能提升其延伸應用的表現。以生醫領域為例,若能去除不正確的關係,將能協助進行舊藥新用等其他應用。 過往有些許研究致力於知識圖譜驗證的發展,例如在2019年Wang等學者提出以功能相依等規則作為度量衡,藉由計算移除特定資料前後的度量衡差距來判斷是否應該移除這些特定資料。另外2020年Ge等學者的研究則是先使用圖神經網路技術從乾淨的知識圖譜中萃取出各個實體及關係的嵌入向量,再利用嵌入向量來訓練一個能夠判斷關係正確性的分類模型。儘管上述方法均有不錯的表現,但仍存在一些限制。首先,制定當作衡量標準的規則是一項費時且需要相關領域知識的工作,而使用嵌入向量則會有未登錄詞的問題。此外,若方法需仰賴額外的乾淨知識圖譜,則會增加該方法的不可行性。 在這項研究中,我們認為能夠使用兩個實體間所有關係的資訊來判斷一個關係是否正確,並且根據此想法提出特徵工程方法、深度學習方法、以及兩者混合後的方法來驗證生醫知識圖譜。根據實驗結果,我們發現實體間的其他關係確實能幫助判斷關係的正確性。此外混合後的方法在準確率、精準度、召回率以及F1分數都取得了更好的成績,表示設計良好的特徵能夠有效地提升深度學習模型的表現。
Knowledge graph verification, or also known as knowledge graph cleaning, is the task of verifying whether the relations between entities in the graph are correct. In addition to effectively improving the quality of the knowledge graph, the cleaned knowledge graph can also improve other extended applications. Taking the field of biomedicine as an example, if incorrect relations can be identified and removed, it will be able to assist some important applications such as drug repurposing. Several prior studies have been devoted to the development of knowledge graph verification. For example, P. Wang He (2019) propose to use rules such as functional dependency as measures. By calculating the measurement gap before and after removing specific data, it judges whether these specific data should be removed. In addition, Ge et al. (2020) use the technique of graph neural network (GNN) to extract the embedding vector of each entity and relation from the clean knowledge graph, and then use the resultant embedding vectors to train a classification model that can identify the correctness of the relations in a focal knowledge graph. Although the existing methods achieve good performance, there are still some limitations. First of all, formulating rules as measures is time-consuming and requires knowledge of the relevant domain, while the use of embedding vectors will face the out-of-vocabulary (OOV) problem. Moreover, if the method needs to rely on external, clean knowledge graphs, it will increase the infeasibility of the method. In this research, we believe that we can use the information of all the relations between two entities to judge whether a relation is correct, and based on this idea, we propose a feature engineering method, a deep learning method, and a hybrid method of the two to verify the biomedical knowledge graph. According to the experimental results, we find that other relations between entities can exactly help identify the correctness of a relation. In addition, our proposed hybrid method achieves the best effectiveness in precision, recall, and F1 score, indicating that well-designed features can effectively improve the effectiveness of our proposed deep learning model.