透過您的圖書館登入
IP:3.138.33.178
  • 學位論文

以成對重疊圖進行短序列重組之計算方法

Computational Approaches for Short Read Assembly Using Paired-Overlap Graph

指導教授 : 黃耀廷
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


近年來,雙端定序(Paired-End Sequencing) 技術平台已被廣泛使用在各物種之基因體重組工作。大多數方法並沒有在一開始就使用雙端定序進行重組,反而先將二端序列視為獨立單元重組成大片段序列(Contig),通常在最後階段才會使用雙端序列將大片段序列合併成更大單元。目前只有少數方法能從一開始就使用雙端序列進行重組工作。此篇論文我們設計了一個軟體,稱為POAssembler,利用雙端定序的成對序列建構出成對重疊圖(Paired-overlap graph)。因為成對定序之雙端間距有高變異度(insert size variance),我們實作了一種修圖演算法,能消除此間距變異造成之複雜泡狀結構。我們測試了兩組模擬資料,以及一組真實資料。其結果顯示我們的方法在模擬資料能有改善,但還不足以處理真實資料定序的各種複雜問題。

並列摘要


In recent years, paired-end sequencing has been widely used for genome assembly of many species in the biosphere. Most assembly approaches first build contigs from short reads but ignore the paired-end information. The contigs are often linked into larger units via paired-end information at subsequent stage. Currently, very limited approaches make use of paired-end reads from the beginning. This thesis designed and implemented a paired-overlap graph assembler, called POAssembler, by incorporating paired-end reads into initial graph construction. A graph simpli cation algorithm is developed to remove complex bubbles generated by insert size variance. We tested our method on two sets of simulated data and one set of real data. The experimental results showed this method improved existing assembly on simulated data but did not perform well on real data sets with complex sequencing errors.

參考文獻


[1] Batzoglou, S., Jaffe, D.B., Stanley, K., et al. Arachne: a whole-genome shotgun assembler. Genome Res, 12(1):177{189, Jan 2002.
[2] Hattori, M. Finishing the euchromatic sequence of the human genome. Tanpakushitsu Kakusan Koso, 50(2):162{168, Feb 2005.
[3] Havlak, P., Chen, R., Durbin, K.J., et al. The atlas genome assembly system. Genome Res, 14(4):721{732, Apr 2004.
[4] Huang, X., Wang, J., Aluru, S., et al. Pcap: a whole-genome assembly program. Genome Res, 13(9):2164{2170, Sep 2003.
[5] Li, H., Handsaker, B., Wysoker, A., et al. The sequence alignment/map format and samtools. Bioinformatics, 25(16):2078{2079, Aug 2009.

延伸閱讀