應用語意分析技術於癌症相關基因探勘與預測整合平台

隨著國民生活水準的提升及飲食習慣的改變，近年來癌症發生機率逐年的提升，根據行政院衛生署公佈的十大死因統計資料中，癌症至今已連續29年位居國人十大死因之首。面對癌症與日俱增的威脅下，癌症發生的原因便是相當重要的研究課題，但是隨著許多的研究人員投入癌症的相關研究使得許多醫學文獻及成果被發表，這些醫學文獻中包含著豐富的資訊如：基因與基因的反應、基因的功能性、生化反應路徑、基因和疾病的關係等，這些資訊都是非常值得被參考的，因此醫學研究人員如何在這些資訊過量的醫學文獻中取得值得研究的資訊，是一個棘手的問題。本研究主要的目標為提供一個整合性平台結合語意分析(sematic analysis)的方法來收集醫學文獻並從中分析及預測癌症相關基因(cancer-related gene)資訊，利用NCBI所提供的PubMed搜尋引擎來搜尋癌症相關基因的醫學文獻(medical literature)與基因序列(gene sequence)，並根據使用者輸入的癌症名稱 (cancer name)與異型接合性損失(loss of heterozygosity, LOH) 及比較型基因組雜交法 (comparative genomic hybridization, CGH)二個癌症的研究方法做組合來進行醫學文獻的收集、分類及探勘，本論文的方法能探勘出重要的癌症基因資訊，並了解基因內各組織的特性，希望藉由此系統能協助癌症相關醫學研究人員，在這資訊過量的時代裡迅速獲得所需的癌症相關醫學文獻與基因資訊，以節省時間並提高研究效率。

關鍵字

癌症相關基因；異型接合性損失；比較型基因組雜交法；醫學文獻

並列摘要

Abstract 　　With the upgrading of the national standard of living and eating habits change, in recent years, the incidence of cancer increased year-by-year. According to the statistics of ten major causes of death published by the Bureau of Health Promotion, Department of Health, R.O.C. (Taiwan), cancer is a leading cause of death for twenty-nine consecutive years. The face of the growing threat of cancer, it is important to study the cause of cancer. With advances in the Human Genome Project, researchers are increasingly becoming engaged in bioinformatics-related research, including genome sequence analysis, drug design and discovery, and curative methods. The published literature contains a wealth of information, such as gene and gene expression, gene and function, biopathway, gene and disease relationship. However, while biomedical researchers how to search and retrieve worthy of study information in biomedical literature, there is a problem of information overloading. 　　The purpose of this study is to develop a biomedical literature mining platform to predict cancer-related genes. The platform applied semantic analysis technology to increase the prediction accuracy of cancers, genes, and chromosome regions. Several value-added databases are constructed to achieve this purpose. They contain information of genes in the instable regions of cancer cells basing on the data accumulated from LOH and CGH experiments. This proposed platform can extract important information to accelerate the study and save plenty of time for biomedical researchers. Besides, this system can also be used on other diseases.

並列關鍵字

cancer-related gene ； loss of heterozygosity (LOH) ； comparative genomic hybridization (CGH) ； biomedical literature

參考文獻

[4] F. Crick，“Central Dogma of Molecular Biology”，Nature，vol. 227，pp. 561-563，Aug. 1970.

[10] A. Kallioniemi, O. P.Kallioniemi, D. Sudar, D.Rutovitz, J. W. Gray, F. Waldman and D. Pinkel, “Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors,” Science, vol. 258, no. 5083, pp.818-21, Oct. 1992.

[13] Chien,L. F., ”PAT-Tree-based keyword extraction for Chinese information retrieval,” Proceedings of the 1997 ACM SIGIR, pp.50-58, 1998.

[14] Ong, T. H. and Chen, H., “Updateable PAT-Tree approach to Chinese key phrase extraction using mutual information: A linguistic foundation for knowledge management,” The 2nd Asian Digital Libraries Conference, pp.63-84, 1999.

[17] H. Borko and M. Bernick, “Automatic Document Classification,” Journal of the ACM, vol. 10, no. 1, pp.151-162, 1963.

國際替代計量

應用語意分析技術於癌症相關基因探勘與預測整合平台

未授權

主題瀏覽