透過您的圖書館登入
IP:18.224.215.101
  • 學位論文

以知識圖譜模型協助生物醫學領域的關係萃取

Biomedical Relation Extraction Supporting by Knowledge Graph Embedding Model

指導教授 : 魏志平

摘要


關係萃取是資訊萃取的子任務而且可以有許多有趣的延伸應用,例如知識庫構建、新關係預測…等。近期有許多的研究致力於應用神經網絡的技術來解決關係萃取的問題。例如在2015年Zeng等學者提出了PCNN,一種為了關係萃取所設計的神經網絡結構。在給定一句話並標註其中兩個命名實體的情況下,PCNN會輸出對於所有關係類別的可能機率分佈。另外在2019年Kuo學者延伸了PCNN,並提出PCNN-GT技術,其中PCNN-GT添加了兩個給定命名實體的語義類型和語義組別作為額外特徵,並在生物醫學領域實現了更好的性能。 儘管PCNN-GT在生物醫學領域提出了一個良好的結果,但仍有兩個方面需要改進。首先,關係謂詞或命名實體應該有獨立於文句語意的全局知識,這些知識應該可以被表達成全局特徵並且將其與PCNN-GT最初提取的特徵一起考慮在內,以此來輔助關係萃取的任務。另外第二點,”其他”類的樣本並不存在有一致性的語義。如果我們能夠更好地處理”其他”這個關係類別,則可以幫助更多領域的關係萃取任務。 在這項研究中,我們著重於通過構建知識圖譜模型來萃取全局特徵,並提出了KG-PCNN-GT這個模型和兩個結構上的修改方案。根據我們的實驗結果,我們發現全局特徵可以提高關係萃取的表現。另外我們也觀察到,我們提出的兩個修改方案會使模型更保守地預測關係,與KG-PCNN-GT相比提高了精準度和F1分數。

並列摘要


Relation extraction is a subtask of information extraction and can support various interesting applications, including knowledge base construction and novel relation prediction. Recent studies have devoted themselves to the use of neural networks for relation extraction. For example, Zeng et al. (2015) proposed PCNN, which is a neural network structure for relation extraction. It needs to feed a context, e1, and e2, and then outputs the probability distribution over all possible relation classes. Kuo (2019) extends PCNN to develop PCNN-GT by adding semantic types and semantic groups of the two given entities as additional input features and results in a better performance in the biomedical domain. Although PCNN-GT reported a good performance in the biomedical domain, two areas still can be improved. First, given a set of relation predications of the focal domain, this set of relation predications should contain a global knowledge about entities and relations, which can be employed to extract global features to support relation extraction. These global features should be taken into account together with features originally extracted by a PCNN-GT. Second, the semantic meaning of the “OTHERS” class is not internally coherent. If we can handle the OTHERS relation type better, our proposed method will be applicable to and benefits many relation extraction applications in other domains. In this research, we focus on extracting global features by constructing the knowledge graph embedding model and propose KG-PCNN-GT and two variants. According to our evaluation results, global features can improve the effectiveness of relation extraction. We also observe that the two variants of KG-PCNN-GT that we propose lead the models to predict relations more conservatively, improving the macro precision and macro F1 score, as compared to those achieved by KG-PCNN-GT.

參考文獻


Abacha, A. B. and Zweigenbaum, P. (2010). Automatic extraction of semantic relations between medical entities: Application to the treatment relation. In Semantic Mining in Biomedicine.
Abacha, A. B. and Zweigenbaum, P. (2011). Automatic extraction of semantic relations between medical entities: A rule based approach. Journal of Biomedical Semantics, 2(S5):S4.
Arnold, P. and Rahm, E. (2015). Semrep: A repository for semantic mapping. Datenbanksysteme f¨ur Business, Technologie und Web (BTW 2015).
Blaschke, C. and Valencia, A. (2001). Can bibliographic pointers for known biological data be found automatically? Protein interactions as a case study. Comparative and Functional Genomics, 2(4):196–206.
Blaschke, C. and Valencia, A. (2002). The frame-based module of the SUISEKI information extraction system. IEEE Intelligent Systems, 17(2):14–20.

延伸閱讀