透過您的圖書館登入
IP:3.19.239.139
  • 學位論文

將成對序列延伸成長序列改善組裝結構

Conversion of Mate-Pair Reads into Long Sequences for Improving Assembly Scaffolding

指導教授 : 黃耀廷
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


利用成對序列(Mate Pair)來改善基因體組裝(Genome Assembly)之完整度已行之有年。雖然成對序列定序有著低價格優勢,其定序時品質並不穩定,且經常有汙染之情形。隨著第三代定序技術問世,其特有之長序列能有效地改善基因體組裝完整度。然而其缺點則是定序錯誤較多且價格高昂。因此本篇論文欲將成本相對較低的成對序列,利用計算方法轉變為長序列,以期能利用長序列的優點來改善基因體組裝。我們利用幾組真實測試資料來驗證我們方法的正確性與準確性。此外,我們也利用原始成對序列、由成對序列轉成的長序列,以及混合兩者分別對基因體組裝結果進行比較。

並列摘要


Mate-pair scaffolding has been used from the early days of genome sequencing to improve the final assembly. Although the mate-pair sequencing is now affordable, its power and accuracy has be limited by the lower quality and contamination. The 3rd generation sequencing, which generates long reads,is helpful for accurate scaffolding. However, the error rates and cost of this technology are still too high. This thesis aims to convert low-cost mate-pair reads into long reads using computational approaches, which has the benefits of both mate-pair reads and long reads for scaffolding. We test our methods by using several real datasets and validate the accuracy of converted long reads. In addition, the scaffolding results are compared using mate-pair reads, long reads, and mixture of both material.

參考文獻


[1] David R Bentley. Whole-genome re-sequencing. Current Opinion in Genetics & Development., 16:545-552, 2006.
[4] Martin Hunt, Chris Newbold, Matthew Berriman, and Thomas D Otto. A comprehensive evaluation of assembly scaffolding tools. Genome Biology, 15(3):1, 2014.
[5] Michael L Metzker. Sequencing technologiesthe next generation. Nature Reviews Genetics, 11(1):31-46, 2010.
[6] Kristoffer Sahlin, Rayan Chikhi, and Lars Arvestad. Assembly scaffolding with pe-contaminated mate-pair libraries. Bioinformatics, page btw064, 2016.
[7] Marten Boetzer and Walter Pirovano. Sspace-longread: scaffolding bacterial draft genomes using long read sequence information. BMC Bioinformatics, 15(1):1, 2014.

延伸閱讀