透過您的圖書館登入
IP:3.138.105.255
  • 學位論文

以獨立性偏差篩選基因交互作用之演算法

An Algorithm for Gene-Gene Interaction via Deviance of Independence

指導教授 : 高成炎 莊曜宇 陳佩君

摘要


全基因組關聯研究 (genome-wide association studies, GWAS)為基因流行病學(genetic epidemiology)中典型的研究設計,用以偵測與疾病相關的基因分子。其基因資料,多半使用微陣列晶片技術偵測的單一核苷酸多型性(single nucleotide polymorphisms, SNPs)的實驗結果作為資料來源。分析的部分,則利用統計方法比較SNPs中各基因型(genotype) 裡疾病與對照組樣本數分布差異,找出可能與疾病相關的SNPs。以往GWAS多半針對單一SNP與疾病的相關性,但複雜疾病(complex diseases) 通常導因為基因之間或基因與環境因子之間存在的交互作用。現有偵測SNPs之間交互作用的方法,多屬於窮舉搜尋法(exhaustive search),如多因子降維法(Multifactor Dimensionality Reduction, MDR),針對每一種可能的組合做運算,因此只適合探討少量SNP中的交互作用。本研究目的是建立一個篩選的機制,從大量SNPs資料中篩出一個候選SNPs集合 (candidate SNP set),而此集合的SNPs有較高的機會存在對疾病有影響的交互作用。 方法的建構是根據機率的獨立性,利用兩個單一SNP在樣本中分布的頻率,計算假設兩者之間獨立時,兩SNPs同時出現在樣本的頻率期望值。而另一方面算出兩SNPs同時出現在樣本的真實頻率值。根據真實值與期望值的偏差(deviance),針對每一個成對的SNP組合,建立出一個獨立性偏差值(Deviance of Independence, DOI),以部分反映此組合的交互作用程度。DOI演算法主要是針對GWAS裡,不具邊際效應(marginal effect)的SNP資料而設計。用來篩出那些在一階檢定不顯著,但經組合之後,能對樣本的疾病狀態有更大的鑑別力的SNPs。藉由模擬資料(simulation data)與真實資料 (real data application)的測試,我們發現利用DOI演算法進行篩選後,可以從中找出顯著的SNPs組合。此研究利用模擬資料,發現DOI演算法有良好與穩定的預測效果。另外在真實資料的實作上,在DOI篩選過後的SNPs集合中,可能可以找出有意義的SNPs組合。因此,DOI演算法為有效篩選基因交互作用的方法。

並列摘要


Genome-wide association studies (GWAS) are commonly used study designs in genetic epidemiology to identify the genetic factors associated with diseases. Most of GWAS adopted single-locus strategy to analyze the association between individual single nucleotide polymorphism (SNP) and diseases. However, complex diseases may cause by one single gene but the gene-gene or gene- environment interactions. Exhaustive search methods, such as multifactor dimensionality reduction (MDR), are popular for detecting gene-gene interactions. Such kinds of methods require enormous computations and therefore are only feasible for small number of SNPs. As a result, this study aims to construct a filtering criterion for a candidate SNP set from large number of SNPs based on the independency of SNPs, called the deviance of independent (DOI). We apply DOI in GWAS data to filter those SNPs without marginal effect individually but have better ability to discriminate between cases and controls when they pool together. We use simulation and real data to examine DOI performance. The simulation results show that SNPs with interactions are along with higher DOI values. In addition, the 2-way and 3-way gene-gene interactions in a real data are examined as well. And the results demonstrate that possible interactions can be identified after using DOI value as filter criteria. In sum, DOI algorithm is a powerful tool to filter a candidate gene set for further interaction analysis.

參考文獻


[1.] Lander, E.S., et al., Initial sequencing and analysis of the human genome. Nature, 2001. 409(6822): p. 860-921.
[2.] Venter, J.C., et al., The sequence of the human genome. Science, 2001. 291(5507): p. 1304-51.
[3.] Altshuler, D., M.J. Daly, and E.S. Lander, Genetic mapping in human disease. Science, 2008. 322(5903): p. 881-8.
[4.] Kruglyak, L. and D.A. Nickerson, Variation is the spice of life. Nature Genetics, 2001. 27(3): p. 234-236.
[5.] Stephens, J.C., et al., Haplotype variation and linkage disequilibrium in 313 human genes. Science, 2001. 293(5529): p. 489-493.

延伸閱讀