透過您的圖書館登入
IP:18.216.251.37
  • 學位論文

以次世代定序平台同時進行單體型之重組與結構性變異之偵測

Simultaneous Haplotype Assembly and Structural Variations Detection Using Next Generation Sequencing

指導教授 : 黃耀廷
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


在生物圈中大部份的物種都是由一對單體型(Haplotype)所組成的雙倍基因體(Diploid Genome),然而目前適用於次世代定序平台的重組軟體都只能重建出一條序列,且此序列是同時包含兩條單體型資訊的馬賽克結構。此外,兩條單體型之間的序列差異包含單一核甘酸多型性(Single Nucleotide Polymorphism; SNP),與大規模的結構性變異(Structural Variation; SV)。因此,要使用次世代平台重建一個雙倍基因體的兩條單體型序列,至今仍是個艱鉅的任務。在此篇論文中我們設計並且實作出一個新的架構,可以利用雙端定序短序列重組出雙倍基因體的兩條單體型序列,我們將其命名為HapSVAssembler。HapSVAssembler首先結合多種重組演算法先重建出一條參考序列稱為參考基因體。透過雙端序列與參考基因體之序列比對,進一步找出異合型單一核甘酸多型性與異合型結構性變異之座標位置。最後分析跨越兩個以上之異合型單一核甘酸多型性或異合型結構性變異的雙端序列,以分離重建出兩條完整的單體型序列。在單體型重組過程中,我們定義出一個新的最佳化問題,並設計基因演算法(Genetic Algorithm; GA)來解決。各種模擬實驗結果顯示HapSVAssembler重組的正確性和完整度都較之前的方法來的好。此外,HapSVAssembler將可協助分析不同遺傳變異間的連鎖不平衡(Linkage Disequilibrium)現象。

並列摘要


The genomes of most species in the biosphere is a diploid genome composed of two haplotypes. However, existing short-read assemblers for next-generation sequencing (NGS) platforms only reconstruct one consensus sequence which is a mosaic of the two haplotypes. In addition, the differences between the two haplotypes range from Single Nucleotide Polymorphisms (SNPs) to large-scale structure variations (SVs). Therefore, de novo haplotype assembly of a diploid genome is a still challenging task using NGS platforms. In this thesis, we design and implement a new framework called HapSVAssembler for de novo assembly of a diploid genome using short paired-end reads. HapSVAssembler uses a hybrid assembly approach to build a consensus sequence, identify heterozygous SNPs and SV loci, and simultaneously reconstruct the SNP/SV haplotypes via reads spanning two or more SNPs/SVs. A new optimization problem is formulated and solved by Genetic Algorithm (GA). The experimental results indicated that the assembly accuracies and continuity of HapSVAssembler is much higher than previous methods. With the ability of assembling haplotypes containing multiple types of genomic variations, HapSVAssembler is very useful for studying linkage disequilibrium across different variations.

參考文獻


[1] Ahn, S.M., Kim, T.H., Lee, S., et al. The first Korean genome sequence and analysis: Full genome sequencing for a socio-ethnic group. Genome Research, 19:1622–1629, 2009.
[2] Alkan, C., Sajjadian, S. and Eichler, E.E. Limitations of next-generation genome sequence assembly. Nature Methods, 1:61–65, 2011.
[3] Bansal, V., and Bafna, V. HapCUT: an efficient and accurate algorithm for the
[5] Boetzer, M., Henkel, C.V., Jansen, H.J., Butler, D., and Pirovano, W. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics, 27:578–579, 2011.
[6] Chaisson, M.J., Brinza, D. and Pevzner, P.A. De novo fragment assembly with short mate-paired reads: Does the read length matter? Genome Research, 19:336–346,

延伸閱讀