透過您的圖書館登入
IP:3.142.196.223
  • 學位論文

生醫文獻探勘-基於遠程監督的圖核以提取基因-基因相互作用

Biomedical literature mining - graph kernel based on distant supervision for extracting gene-gene interactions

指導教授 : 謝璦如
本文將於2026/07/12開放下載。若您希望在開放下載時收到通知,可將文章加入收藏

摘要


監督式機器學習方法常被應用在生物醫學關係提取。缺點是需要帶註釋的 訓練樣本資料集,通常由人工花費大量時間及成本創建。遠程監督通過將知識庫與語料庫結合,自動化註釋訓練語料庫。這種方法在生物醫學非常實用,因為許多生物醫學資料庫已提供可供研究的知識庫,但可使用的註釋語料庫卻數量有限。 而基因-基因交互作用可幫助解釋人類複雜性疾病缺失的遺傳率(heritability),因此本研究主要目的為發展基因-基因交互關係的提取方法。本研究使用KEGG pathway知識庫的基因-基因交互作用資訊,從PubMed摘要中生成訓練樣本集,並使用基於圖核的方法提取基因-基因交互關係。評估結果最好可以達到F-score為0.79。 本研究發展遠程監督方法,可在自動化創建基因-基因交互關係提取的語料庫的能有效減少人工註釋數據所需花費的大量時間成本;而基於圖核的關係提取方法成功應用在基因-基因交互關係提取,期望本研究成果能幫助精準醫療之實現。

並列摘要


Supervised machine learning methods are often used in biomedical relationship extraction. The drawback is the need for annotated datasets of training samples, which are usually created at considerable time and cost by manual. Distant supervised can automatically annotate and train corpus by combining knowledge base with corpus. This approach is useful in biomedicine, where many biomedicine databases already provide a knowledge base to study, but the number of annotated corpora that can be used is limited. Gene-gene interaction can help explain heritability of complex diseases in humans, so the main purpose of this study is to develop methods to extract gene-gene interaction. In this study, gene-gene interaction information from the KEGG pathway knowledge base was used to generate training sample sets from the PubMed abstract, and the gene-gene interaction was extracted by the method based on graph kernel. The best assessment result could be achieved with an F-score of 0.79. In this study, a distant supervised method is developed, which can effectively reduce the time cost of manually annotating data in automating the creation of gene-gene interaction extracted corpus. The relationship extraction method based on graph kernel has been successfully applied to the extraction of gene-gene interaction relationship. It is expected that the results of this study can help the realization of precision medicine.

參考文獻


1. Andronis, C., et
al., Literature mining, ontologies and
information visualization for drug repurposing. Briefings in
bioinformatics, 2011. 12(4): p.
357-368.

延伸閱讀