透過您的圖書館登入
IP:18.118.162.111
  • 學位論文

利用多個參考基因體重組DNA片段

Assembling Contigs Using Multiple Reference Genomes

指導教授 : 盧錦隆
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


次世代定序 (Next Generation Sequencing,簡稱NGS)技術已經允許我們對許多有興趣的物種有效率地產生出他們的基因體草圖(draft genomes)。然而,大多數的基因體草圖都還只是一群獨立的DNA片段(contigs),他們在被定序基因體上的相對位置與方向是未知的。目前有許多的軟體工具,包括我們實驗室在2014年所設計出來的CAR,已經被發展出來可以利用一個參考基因體去決定基因體草圖上contigs的前後順序與方向。事實上,如果基因體草圖與參考基因體之間演化關係不是很親近的話,所有利用單一參考基因體的軟體工具都有可能會產生錯誤的scaffolds。換句話說,只使用一個參考基因體可能不足以產生正確基因體草圖的scaffolds。最近,有一個Ragout的軟體工具可利用多個參考基因體來產生較為準確的基因體草圖scaffolds。然而,Ragout需要使用者輸入一個基因體草圖與參考基因體之間的演化樹,事實上,使用者事先並不是那麼容易可以取得這個演化樹。在本研究中,受到Ragout的啟發,我們發展出兩個多參考式的軟體工具Multi-CAR與Ragout-CAR去嘗試改善我們的CAR使得它能夠利用多個參考基因體去產生高質量基因體草圖的scaffolds。基本上,Multi-CAR是只利用CAR所設計出來的且不需要使用者輸入演化樹,而Ragout-CAR是利用Ragout與CAR設計出來的但會自動地產生出演化樹。最後我們在許多測試資料組的實驗結果已顯示出在大多數情況下,多參考式軟體工具的性能表現比單參考式軟體工具還來的好,除此之外,有些測試資料顯示出Ragout-CAR比Multi-CAR好,然而也有一些測試資料顯示出Multi-CAR比Ragout-CAR好。上述的研究成果展現出Multi-CAR與Ragout-CAR兩者皆可以利用多個參考基因體產生高品質基因體草圖的scaffolds。

並列摘要


Next generation sequencing technology has allowed efficient production of draft genomes for many organisms of interest. However, most draft genomes are just collections of independent contigs, whose relative positions and orientations along the genome being sequenced are unknown. Currently, several tools, including CAR that was designed by our laboratory in 2014, have been developed to order and ori-ent the contigs of draft genomes using single reference genomes. In fact, all these single-reference based tools may produce erroneous scaffolds of draft genomes, if the evolutionary relationship between the draft and reference genomes is not close-ly related. In other words, it may not be sufficient to use single reference genomes for producing correct scaffolds of draft genomes. Recently, there is a tool called Ragout that can utilize multiple reference genomes to generate more accurate scaf-folds of draft genomes. However, Ragout requires users to input a phylogenetic tree of the draft and reference genomes, which actually cannot be easily obtained by us-ers in advance. In this study, motivated by Ragout, we try to improve our single reference-based tool CAR by developing two multiple reference-based tools, called multi-CAR and Ragout-CAR, that can utilize multiple reference genomes to pro-duce high-quality scaffolds of draft genomes. Basically, multi-CAR is designed based on only CAR and does not require users to input a phylogenetic tree, and Ra-gout-CAR is designed based on both Ragout and CAR and can automatically create a phylogenetic tree. Finally, our experimental results on several testing datasets have shown that in most cases, multiple reference-based tools have better perfor-mance than their single reference-based tools. In addition, there are some testing cases showing that Ragout-CAR is better than Multi-CAR, while there are some other testing cases showing that Multi-CAR is better than Ragout-CAR. All these results demonstrate that both Multi-CAR and Ragout-CAR can be useful for pro-ducing high-quality scaffolds of draft genomes using multiple reference genomes.

參考文獻


[1] C. L. Lu, K.-T. Chen, S.-Y. Huang and H.-T. Chiu. (2014) CAR: contig assembly of prokaryotic draft genomes using rearrangements, BMC Bioinformatics, 15, 381.
[2] Kolmogorov M, Raney B, Paten B and Pham S. (2014) Ragout - a reference-assisted assembly tool for bacterial genomes, Bioinformatics, 30, i302-i309.
[3] Minkin I, Patel A, Kolmogorov M, Vyahhi N and Pham S. (2013) Sibelia: A scalable and comprehensive synteny block generation tool for closely related microbial ge-nomes. Lecture Notes in Computer Science, 8126, 215-229.
[5] van Hijum SA, Zomer AL, Kuipers OP, Kok J. (2005) Projector 2: contig mapping for efficient gap-closure of prokaryotic genome sequence assemblies. Nucleic Acids Re-search, 33, W560–W566.
[6] Richter DC, Schuster SC, Huson DH. (2007) OSLay: optimal syntenic layout of unfin-ished assemblies. Bioinformatics, 23, 1573–1579.

延伸閱讀