透過您的圖書館登入
IP:3.135.202.224
  • 學位論文

以生物路徑分數及貝氏方法量化與排序生物路徑及基因相關性之分析

Use of Pathway Score in Bayesian Model to Quantify and Prioritize Pathway Association and Gene Ranking

指導教授 : 蕭朱杏
共同指導教授 : 盧子彬
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


疾病與基因之間的關係密切,以十大死因而言,有九項是疾病。另外,癌症是目前全球關心的重要議題,不僅每年罹癌人數多,且因癌症而死亡人數亦高,而癌症的發生亦與基因相關。健康者進行基因檢測能進一步了解自身的疾病風險,進而預防疾病發生,而病患能透過與疾病相關之關鍵基因進行修復或破壞,進而治疾療疾病。因此,許多研究人員關心與疾病有關之關鍵基因,但人類基因超過兩萬個,研究人員需耗費大量的時間與金錢才能找到關鍵基因,為了降低成本,提供有優先順序的清單可能有助於尋找關鍵基因。目前篩選與疾病相關基因的方法大約分成三種,如單一基因方法(Singal marker test)、基因群方法(Gene set analysis)及生物路徑方法(Pathway analysis),提供候選基因或候選基因集合。 部份的統計方法從單一基因尋找與疾病相關之基因,像是T檢定(T test)或是卡方檢定(Chi-square test);然而因標誌基因(Biomarker)數多,通常有多重檢定的問題,且沒有考慮基因之間的關係。基因不會單獨運作,需與其他基因共同影響一個生物過程(Biological process)。因此,比起只關心單一基因對生物過程的影響,更應該考慮一群基因整生物過程的影響。於是有第二類型的方法,以基因群方法尋找與疾病相關之基因,像是Gene Set Enrichment Analysis (GSEA)、Over Representation Analysis (ORA)、Globa test及Fisher’s method,基因事先被分成許多基因集合(Gene sets),而找到的基因集合可能是一個好的指標,但是在生物意義上卻很難解釋。生物路徑為一群基因共同影響一個生物過程。第三種方法以生物路徑尋找相關之基因,例如Signaling Pathway Impact Analysis (SPIA),目前此類方法沒有考慮生物路徑間的競爭及關係,有些亦沒有考慮生物路徑中基因間的關係,故本就究計劃提出新方法克服目前現有方法的限制。 本研究不僅考慮生物路徑中基因間的關係,並且同時考慮多個生物路徑間互相競爭,又能處理生物路徑間的關係。在上述的條件下提出的新方法Bayesian Approach to Prioritizing Pathway (BAPP),BAPP提供與疾病相關之有序候選生物路徑清單,更進一步於首要生物路徑中尋找關鍵基因,提供另一個與疾病相關之有序關鍵基因清單,並且結合常見生物路徑資料庫Kyoto Encyclopedia of Genes and Genomes (KEGG)於本研究中。BAPP在模擬中有很好的表現。不論是排序候選生物路徑,或是關鍵基因,BAPP皆能控制型一誤差在0.05下。BAPP正確排序候選生物路徑之正確率比他方法高,並且能找到真正的關鍵基因。 本研究將提出的新方法應用於乳癌(Breast cancer)及膠質母細胞瘤(Glioblastoma multiforme)資料。在乳癌資料中,BAPP找出乳癌之首要生物路徑為Jak-STAT生物路徑,進一步從此生物路徑找出37個關鍵基因,而不曾被提及與乳癌有關之味覺傳導(Taste transduction)生物路徑被BAPP排在最後面。在膠質母細胞瘤資料中,本研究的方法找出Long-term potentiation為膠質母細胞瘤之首要生物路徑,從中找出4個關鍵基因。

並列摘要


Cancer is an important topic of global concern. Some cancers are closely related to genetic aberrations. Not only is there a large number of new cancer patients per year, but the number of deaths due to cancer is also high. Genetic testing can help understand one’s disease risk and may prevent disease occurrence, if the causal key genes can be identified. Therefore, many researchers focus on identifying the key genes, among more than 20,000 human genes. Researchers need a lot of time and money to find key genes. In order to reduce costs, providing a prioritized list may help to find key genes. Currently, methods for screening genes associated with diseases are roughly classified into three types, such as the single marker tests, gene-set analysis methods, and pathway analysis, to provide candidate genes or candidate gene sets. Some statistical methods find disease-related genes from a single marker test, such as the T test or Chi-square test. However, due to the large number of biomarkers, scientists need to face the issue of multiple testing. Single marker test did not consider the relationship between genes. Genes do not work alone and need to work with other genes to affect a biological process. Therefore, rather than focusing on the effect of a single gene on biological processes, the effect of a group of genes should be considered. So there is a second type of method to find disease-related genes in a group of genes, such as the Gene Set Enrichment Analysis (GSEA), Over Representation Analysis (ORA), Globa test, and Fisher's method. A pathway is a collection of genes containing biological meaning. The pathway represents a biological process carried out by a group of genes. The third method uses pathways to find related genes, such as Signaling Pathway Impact Analysis (SPIA). Currently, such methods do not consider simultaneously several competing pathways; they do not incorporate the relationship between pathways, nor account for the relationship between genes. This study will provide a novel method to overcome the limitations of current methods. This study not only considers the relationship between genes in the pathway but also considers the competition between several pathways and the relationship between pathways. Bayesian Approach to Prioritizing Pathway (BAPP), a novel method proposed under the above conditions, provides a list of ordered candidate pathways associated with the disease. The BAPP can further search for key genes in the primary pathway, and provide disease-related key genes. BAPP can be applied on the common pathway database, Kyoto Encyclopedia of Genes and Genomes (KEGG). Simulations show that BAPP performs well. Whether it is prioritizing candidate pathways or key genes, BAPP can control the type I error rate under 0.05. BAPP correctly ranks candidate pathways at a higher accuracy than other methods and can find true key genes. This novel method is applied to a breast cancer study and a glioblastoma multiforme study. In breast cancer data, BAPP identifies the primary pathway of breast cancer as the Jak-STAT signaling pathway, and further identifies 37 key genes in this pathway. The Taste transduction pathway that has not been reported to associate with breast cancer is ranked last by BAPP. In the glioblastoma multiforme study, BAPP identifies Long-term potentiation as the primary pathway, and from which four key genes are identified.

參考文獻


Hanahan, D., & Weinberg, R. A. (2011). Hallmarks of cancer: the next generation. Cell, 144(5), 646-674. doi:10.1016/j.cell.2011.02.013
Abba, M. C., Gong, T., Lu, Y., Lee, J., Zhong, Y., Lacunza, E., Butti, M., Takata, Y., Gaddis, S., Shen, J., Estecio, M. R., Sahin, A. A., Aldaz, C. M. (2015). A Molecular Portrait of High-Grade Ductal Carcinoma In Situ. Cancer research, 75(18), 3980-3990. doi:10.1158/0008-5472.CAN-15-0506
Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, Allan P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., Ringwald, M., Rubin, G. M., Sherlock, G. (2000). Gene ontology: tool for the unification of biology. Nature Genetics, 25, 25-29.
Bachmanov, A. A., Bosak, N. P., Lin, C., Matsumoto, I., Ohmoto, M., Reed, D. R., & Nelson, T. M. (2014). Genetics of Taste Receptors. Current pharmaceutical design, 20(16), 2669-2683.
Barry, W. T., Nobel, A. B., & Wright, F. A. (2005). Significance analysis of functional categories in gene expression studies: a structured permutation approach. Bioinformatics, 21(9), 1943-1949. doi:10.1093/bioinformatics/bti260

延伸閱讀