透過您的圖書館登入
IP:216.73.216.114
  • 學位論文

建構線上DNA變異位點插補伺服器

Development of an online system for DNA imputation

指導教授 : 莊曜宇

摘要


基因型插補(Genotype imputation),是在進行全基因組關聯研究(Genome wide association study ,GWAS)之前的重要步驟。它能透過龐大的參考序列資料庫進行預測並填補缺失的基因型以增加樣品的SNP密度和GWAS的分析資料性。然而,整個基因型插補過程包括一系列複雜的插補前以及插補後步驟,運算過程需要耗費龐大的運算資源量,並且也需要生物資訊學專業知識。因此,我們建立了一個對於使用者方便的網頁插補伺服器服務,名為Multi-racial Imputation System(MI-system),該服務分別使用生物資訊學家常用的pre-phasing 軟體SHAPEIT 和imputation軟體IMPUTE2進行運算。對於所使用的參考序列資料庫,該服務首次包括了Taiwan biobank(TWB)序列資料庫,並根據使用者需求為其提供1000 Genome Phase III和TWB以及Hapmap3序列資料庫可進行選擇,也添加了IMPUTE2特有的兩種merge reference imputation功能來增強插補的結果。該服務進一步提供了彈性的質量控制選項,並讓使用者能從多個選項中自行選擇所要篩選的次要等位基因頻率(Minor allele frequency)閾值、需要過濾的基因型及樣本缺失率以及Hardy-Weinberg平衡的閾值。為了增加使用者的便利性,該服務還提供了一些實用功能,例如(i)分割全基因組SNP資料,(ii)基因組座標軸轉換(grch37和grch38),以及(iii)透過使用者上傳的基因型資料建立定制建構參考序列資料庫。使用者只需要簡單的幾個步驟即可執行實用程式功能並快速獲得高通量的SNP插補資料。並能夠將結果轉換成與流行的GWAS分析工具(例如PLINK,SNPTest或R)兼容的格式進行下載,以方便進行後續分析。

並列摘要


Genotype imputation is an important process before genome-wide association studies (GWAS). It predicts missing genotypes from reference panels to increase the sample SNP density and the power of the GWAS. However, the process encompasses a series of extensive pre and post imputation procedures, is computationally expensive, and requires expertise in bioinformatics. Therefore, we have developed a user-friendly, web-based imputation service, Multi-racial Imputation System (MI-system), that utilizes popular pre-phasing and imputation softwares, SHAPEIT2 and IMPUTE2, respectively. For the reference panels, the server includes the Taiwan biobank (TWB) panel for the first time. It offers users to choose from 1000 Genome phase III, TWB and Hapmap3 panels, as reference genomes. Furthermore, the users can choose the IMPUTE2 specific function “merge reference”, for merging multiple reference panels to conduct imputation. The server, also provides flexible quality control options and allows users to choose thresholds for parameters such as minor allele frequency, missing (SNP level and individual level) genotyping rates, and Hardy-Weinberg equilibrium. For user’s convenience several additional utility functions such as (i) splitting whole genome SNP data, (ii) conversion of genome builds (grch37 and grch38), and (iii) build customized reference panels from user uploaded genotype data, are offered. The users can obtain high-throughput imputed data and access utility functions through few easy and simple clicks. The results can also be downloaded in formats that are compatible with popular GWAS tools such as PLINK, SNPTest, or R to further downstream analysis.

參考文獻


1. Manolio, T. A., Collins, F. S., Cox, N. J., Goldstein, D. B., Hindorff, L. A., Hunter, D. J., McCarthy, M. I., Ramos, E. M., Cardon, L. R., Chakravarti, A., Cho, J. H., Guttmacher, A. E., Kong, A., Kruglyak, L., Mardis, E., Rotimi, C. N., Slatkin, M., Valle, D., Whittemore, A. S., Boehnke, M., … Visscher, P. M. (2009). Finding the missing heritability of complex diseases. Nature, 461(7265), 747–753. https://doi.org/10.1038/nature08494
2. LaFramboise T. (2009). Single nucleotide polymorphism arrays: a decade of biological, computational and technological advances. Nucleic acids research, 37(13), 4181–4193. https://doi.org/10.1093/nar/gkp552
3. Pierre, A. S., Genin, E. (2014). How important are rare variants in common disease? Briefings in Functional Genomics, 13(5), 353-361. doi:10.1093/bfgp/elu025
4. Li, Y., Willer, C., Sanna, S., Abecasis, G. (2009). Genotype imputation. Annual review of genomics and human genetics, 10, 387–406. https://doi.org/10.1146/annurev.genom.9.081307.164242
5. Marchini, J., Howie, B. (2010). Genotype imputation for genome-wide association studies. Nature Reviews Genetics, 11(7), 499-511. doi:10.1038/nrg2796

延伸閱讀