透過您的圖書館登入
IP:18.221.13.173
  • 學位論文

利用多平台基因變異尋找疾病相關之基因區域

An integrative analysis of DNA copy number and SNP markers to localize causal gene region

指導教授 : 蕭朱杏
共同指導教授 : 盧子彬(Tzu-Pin Lu)

摘要


過去許多全基因體關聯性研究(genome-wide association study, GWAS)專注在單一核苷酸多型性平台(single nucleotide polymorphism, SNP),隨著基因定序技術不斷地進步,得以從人類身上取得更多不同型態的生物標記,例如單一核苷酸多型性陣列(SNP array)可以定型SNP,也可以估計出拷貝變異數(copy number variation, CNV);而考慮的分子變異越多,資料的數量也越龐大,為了處理高維度資料和標記間與跨平台之複雜關係,多平台基因資料的整合分析(integrative analysis)便成為重要的議題。過去的整合分析方法多在單一平台進行初步分析,再利用文氏圖(Venn diagram)取交集或是聯集之基因群作為結果,但此方法未考慮到標記於平台內和平台之間的關係。為了克服這些缺點,本研究整合SNP和CNV兩個分子階層,透過基因之關聯性進行檢定,最終,針對挑選出來的基因進行較小區域之移動窗口分析,定位出具有遺傳變異的區段。根據模擬的結果顯示本研究提出之整合分析策略能穩健地偵測出跨平台共同作用導致而成的複雜疾病。除模擬之外,本論文將提出之整合分析策略應用於臺灣人體生物資料庫(Taiwan Biobank),並且針對低密度脂蛋白膽固醇(low density lipoprotein cholesterol, LDL-C)和三酸甘油脂(triglyceride, TG),整合SNP和CNV兩個平台的訊息各自找出40個具有關聯性的基因,及其重要的遺傳訊息區段。除了能夠偵測出已經被彙報具有關聯的基因外,也提供未來研究不同的基因遺傳區段。

並列摘要


With the fast progress in sequencing technologies, multiple levels of genomic data can now be obtained from a single set of samples; for instance, SNP array can be efficiently used to genotype SNPs and measure CNVs. The data sizes increase dramatically while considering of various types of genetic variants simultaneously. An integrative analysis therefore is required to deal with the high-dimensionality and complex relationships among markers within and across platforms. Previous integrative analyses usually identify genes purely based on one single platform, and union or intersect the results according to the gene symbols without considering the dependence among markers. To address this issue, I hereby proposed a novel pipeline to integrate genomic copy number and SNP data. In the first, an association test is used to identify significant genes. Subsequently, a moving window analysis is utilized to pinpoint the causal gene regions. The proposed analysis pipeline was implemented in several simulation scenarios, and the results showed good and robust performances, especially when the interaction effects were considered. In addition, this pipeline was applied in two real studies including low density of lipoprotein cholesterol (LDL-C) and triglyceride (TG). The data were obtained from Taiwan Biobank. Several regions in 40 genes were identified and their strong associations with LDL-C and TG were reported, respectively. In conclusion, these results demonstrate that the proposed integrated method is able to identify important causal genes, especially those genes that have not been reported previously by using the naïve method.

並列關鍵字

SNP CNV integrative analysis Taiwan Biobank LDL-C TG

參考文獻


1.Weedon MN, Lango H, Lindgren CM, Wallace C, Evans DM, Mangino M, et al. Genome-wide association analysis identifies 20 loci that influence adult height. Nat Genet. 2008;40 (5):575-583. doi:10.1038/ng.121.
2.Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, et al. Finding the missing heritability of complex diseases. Nature. 2009;461:747-753. doi:10.1038/nature08494.
3.Ritchie MD, Holzinger ER, Li R, Pendergrass SA, Kim D. Methods of integrating data to uncover genotype-phenotype interactions. Nat Rev Genet. 2015;16 (2):85-97. doi:10.1038/nrg3868.
4.Pinto D, Darvishi K, Shi X, Rajan D, Rigler D, Fitzgerald T, et al. Comprehensive assessment of array-based platforms and calling algorithms for detection of copy number variants. Nat Biotech. 2011;29 (6):512-520. doi:10.1038/nbt.1852.
5.Korn JM, Kuruvilla FG, McCarroll SA, Wysoker A, Nemesh J, Cawley S, et al. Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nat Genet. 2008;40:1253-1260.

延伸閱讀