透過您的圖書館登入
IP:3.140.185.123
  • 學位論文

以整合分析法結合多種生物平台資訊偵測基因變異

An Integrative Analysis for Susceptible Genetic Variants with Information across Platforms

指導教授 : 蕭朱杏
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


近年來分子生物技術蓬勃發展,人類全基因體學、轉錄體學、蛋白質體學開始被結合運用在人類複雜性疾病的基因偵測。然而人類複雜性疾病的生成機制往往非常複雜,且全體學資料過於龐大,單一基因變異對於疾病生成的效力通常不易偵測,不同研究所偵測出的基因變異也有不一致的情形。為更進一步瞭解複雜性疾病的生成機制,有學者開始提出將這些不同生物層次資料結合的研究方法。此研究方法即稱作整合性分析(Integrative Analysis)。近幾年來,研究學者們提出不同的生物模式前提假設來針對整合性分析建立相對應的統計分析模型。然而大部分的整合性分析探討基因變異與複雜性疾病的相關性時,雖然考慮了基因變異彼此之間的相關性,但是缺乏將此資訊納入疾病分析之中的分析方法。因此我們提出一個貝氏整合迴歸模型,此模型不僅能夠同時納入不同基因層級的資料,並且能同時將這些資料彼此之間的關係放入先驗模型之中。為瞭解此方法的表現,我們進行一模擬研究以及一慢性阻塞性肺病(COPD)的實際資料分析。在模擬研究中,我們與單一標記檢定方法(single marker tests)進行真陽性率和偽陽性率的比較。從結果來看,我們所提出的統計模型除了保留基因變異之間相關性之外,也能夠偵測出與疾病相關的基因變異。我們提出的貝氏整合迴歸模型除了可以偵測單一基因變異及一群基因變異對於表現型的效力之外,透過後驗機率亦可偵測個案發病機率的後驗分布。總結來說,本文提出的統計模型,能夠在不違反疾病生成之生物機制之下,偵測與複雜性疾病相關的基因變異因子。除了本文中所應用之DNA變異資料及RNA表現量資料類型之外,此模型在未來能夠應用在更多資料型態,亦能夠應用於其它複雜性疾病成因之探討。

並列摘要


Molecular biotechnology has advanced greatly in recent decades. With such technology, researchers have conducted many studies to identify possible genetic causes of complex diseases. Several studies focused on specific complex diseases using human genomic, transcriptomic, or proteomic databases. However, the identified genetic variants are often of small effect sizes, and may not be replicable in other studies, leading to difficulties in evaluation of the biological relation between the identified genetic variants and the disease of interest. Alternatively, scientists considered integrative analysis to overcome these difficulties and to identify the susceptible genetic variants correlated with diseases. Current integrative analysis incorporated only the information of locations such as cis or trans, but not other possible interaction between platforms. Based on multiple genomic databases, we proposed in this study a Bayesian integrative model which builds upon biological assumptions to derive an inference that is biologically meaningful. The proposed model was the first approach to include in analysis the information between genetic variants from different sources and of different types. To evaluate the performance of the proposed model, we conducted simulation studies and analyzed a COPD dataset containing genomic variations and mRNA expression data downloaded from Gene Expression Omnibus (GEO). In simulation studies, we compared the true positive and false positive rates with single marker tests under different simulation settings. Both simulation studies and application showed that our proposed model had better ability to detect the disease-related genetic markers. In conclusion, this study provides a novel insight into the identification of the genetic variants in disease-related phenotypes, with more available omic data.

參考文獻


1. Ginsberg, S.D., S.E. Hemby, and J.F. Smiley, Expression profiling in neuropsychiatric disorders: emphasis on glutamate receptors in bipolar disorder. Pharmacol Biochem Behav, 2012. 100(4): p. 705-11.
2. Rankinen, T., et al., The human obesity gene map: the 2005 update. Obesity (Silver Spring), 2006. 14(4): p. 529-644.
3. Soranzo, N., et al., Common variants at 10 genomic loci influence hemoglobin A(1)(C) levels via glycemic and nonglycemic pathways. Diabetes, 2010. 59(12): p. 3229-39.
4. Frayling, T.M., et al., A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science, 2007. 316(5826): p. 889-94.
5. Willer, C.J., et al., Six new loci associated with body mass index highlight a neuronal influence on body weight regulation. Nat Genet, 2009. 41(1): p. 25-34.

延伸閱讀