透過您的圖書館登入
IP:3.16.218.208
  • 學位論文

以雙端定序技術改善序列重組

Improvement of De novo Assembly Using Paired-End Sequencing

指導教授 : 黃耀廷
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


次世代定序 (Next Generation Sequencing) 技術已經被廣泛使用於定序並重組出尚未被研究物種的基因體。事實上,由於基因體序列的高複雜度及次世代定序技術所產生的序列片段較短,大部份被重組出的基因體仍相當破碎。在本篇論文中,我們設計並撰寫一個利用雙端定序技術,能將重組出的大片段序列進一步延長,將其命名為CEPS。CEPS能快速地偵測出落在大片段序列邊緣之短序列片段,判斷是否有發生重複(Repeat)序列並加以延長。利用雙端序列的特性,CEPS能克服目前基因體重組,在定序高低覆蓋率區域間會破碎的現象。我們使用多組模擬資料,實作、測試、並比較CEPS與目前的序列重組軟體。實驗結果顯示CPES可組出更完整的基因體,除可獲得到較高的N50,其正確性更可相當逼近100%。值得一提的是,CEPS可以整合多種不同長度之雙端定序資料,來進一步改善基因體重組。

並列摘要


Next Generation Sequencing (NGS) technologies have been widely used to assemble the genome of unstudied species in the biosphere. In practice, the assembled genomes are very fragmented due to the complexity of the genome and relatively short length of reads. In this thesis, we design and implement a Contig Extension using Paired-end Sequencing (called CEPS) software for improving de novo assembly. By using paired-end sequencing and, CEPS extract paired-end reads over hanging on the boundary of contigs and extend these contigs across extreme low- and high-coverage regions, which often lead to fragmented genomes by most assemblers. CEPS has been multi-threaded, tested and compared with existing assemblers using a variety of simulated data sets. The experimental results indicated that CEPS significantly produced a more contiguous genome with larger N50 and genome size, and the assembly accuracy as high as ~100%. It is worth mentioning that the CEPS can integrate multiple paired-end or mate-pair libraries for further improving genome assembly.

參考文獻


1. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA et al: The sequence of the human genome. Science 2001, 291(5507):1304-+.
2. Varki A: A chimpanzee genome project is a biomedical imperative. Genome Res 2000, 10(8):1065-1070.
3. Mardis ER: Next-generation DNA sequencing methods. Annu Rev Genom Hum G 2008, 9:387-402.
4. Peng ZG, Wu R: A New and Simple Rapid Method for Sequencing DNA. Method Enzymol 1987, 155:214-231.
5. Sundquist A, Ronaghi M, Tang HX, Pevzner P, Batzoglou S: Whole-Genome Sequencing and Assembly with High-Throughput, Short-Read Technologies. Plos One 2007, 2(5).

延伸閱讀