透過您的圖書館登入
IP:18.217.220.114
  • 學位論文

生物資訊文獻中人類遺傳疾病與基因關聯度之研究

The Study of Gene-Disease Associations from the Bioinformatics Literature

指導教授 : 侯文娟
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


本論文之研究,是在探討文獻中人類遺傳疾病與基因的關聯度,希望從中得到一些人類遺傳疾病與基因這兩者之間的關係,其目的在於希望在往後的生物資訊文獻上,可以快速的得知文獻上出現的人類遺傳疾病是否與文獻上出現的基因相關聯。 本論文所使用的相關資料包含了醫學文獻資料庫(Medical Literature Analysis and Retrieval System Online, Medline),從中擷取出所需要使用的資訊,包括PMID、TI以及AB,其中PMID為該篇的ID number,TI為標題,而AB即為內文。接著,利用Geniatagger來標記AB上出現的基因。再來,利用線上人類孟德爾遺傳學(Online Mendelian Inheritance in Man, OMIM)的網站,下載人類遺傳疾病與相關基因的資料,再利用這兩者去標記AB上出現的疾病與基因。 針對此研究,提出了兩類運算的方法,其中第二類方法會再加以變化,衍生出新的運算方法。第一類的方法分為五種,第一種方法是運用密度的計算公式,第二種是運用重力公式,此公式有四種變化。第二類的方法就是自然語言常用的Dice,在此,以此公式為基本架構,再加以調整延伸公式,和一般的比例公式以及一般比例公式延伸變化。 II 最後求出的結果,前兩者的準確率最高是在一成左右,屬於偏低的準確率,其原因是,他們只有運用到位置與TFIDFT(Term Frequency Inverse Document Frequency(Term))的變數去計算他們的值,忽略了一些疾病與基因的特性,所以分數才會如此的不顯著。再來,運用以Dice為主要架構的變化公式,這方法考慮到Gene Ontology,對此實驗來說,考慮的要素正好符合實驗的精神,所以計算出的分數,才會越高而越接近實驗的正確配對,當過一個門檻值之後,準確率就會達到100%。

並列摘要


In this study, we explore the relationships between humanity genetic diseases and genes from documents and hope our approach can help realize the relation between humanity genetic diseases and genes. The purpose of this thesis is to make people to find the relation from bioinformatics documents more efficiently if some genetic disease is related with the gene in documents. This study uses information that includes a part from Medical Literature Analysis and Retrieval System Online, called Medline which comprises PMID, TI and AB. PMID is the ID number and TI is the topic. In addition, AB is the content. Next, we use Geniatagger to tag the gene which appears in AB. Then, we reference to the website named “Online Mendelian Inheritance in Man, OMIM” and download the information about the gene related with humanity genetic diseases. Therefore, we are able to tag the genes and diseases which appear in AB. We propose two different operational analysis methods in the research. The first type is divided into five different kinds: The first kind is to use formula of the density to calculate. The second kind is to use formula of the gravity, and it has four different variations. The second type of operational analysis is Dice. We also take this analysis as a foundation to extend the formula, and the change of general ratio formula and extension of general ratio formula. The result of operational analysis about the first kind shows the highest accuracy approximates ten percent. The rate of accuracy is somewhat low. The reason is that they only use the position and Term Frequency Inverse Document Frequency (Term) variable, and ignore the features of some diseases and genes. That’s the reason why fraction has no significant relationship. Next, we let the formula use dice as the main foundation, and consider the importance of Gene Ontology. It matches the experimental spirit of the research. As a result, the fraction which gets from calculating becomes much higher and is more close to the correct IV pairs. After the fraction which exceeds the threshold, the accuracy will achieve a hundred percent.

並列關鍵字

humanity genetic disease gene Medline OMIM

參考文獻


[1] M. Batet, D. Sanchez, A. Valls and K. Gibert, “Exploiting taxonomical knowledge to compute semantic similarity: An evaluation in the biomedical domain,”, 2010, pp. 274–283.
[2] J. Y. Chen, C. Shen, and A. Y. Sivachenko, “Mining Alzheimer disease relevant proteins from integrated protein interactome data,” Pacific Symposium on Biocomputing, vol. 11, 2006, pp. 367–378.
[3] D. Hristovski, B. Peterlin, J. A. Mitchell, and S. M. Humphrey, “Using literature-based discovery to identify disease candidate genes,” International Journal of Medical Informatics, vol. 74, 2005, pp. 289–298.
[4] Y. Hu, L. M. Hines, H. Weng, D. Zuo, M. Rivera, A. Richardson, and J. LaBaer, “Analysis of genomic and proteomic data using advanced literature,” Journal of Proteome Research, vol. 2, 2003, pp. 405–412.
[6] C. Perez-Iratxeta, P. Bork, M. Andrade, A. Nat, “Association of genes to genetically inherited diseases using data mining,” Genet. 2002, pp.316-319.

被引用紀錄


陳孝源(2012)。人類基因與疾病關係之規則擷取〔碩士論文,國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-1610201315295358

延伸閱讀