透過您的圖書館登入
IP:3.135.183.89
  • 學位論文

根據保留區間距離解決Scaffolding問題之研究

The Study of Solving Scaffolding Problem Based on Conserved Interval Distance

指導教授 : 盧錦隆
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


Scaffolding是DNA定序的過程中非常重要的一個步驟,其目的是把一個基因體草圖中的contigs給定序與定向。先前,我們的實驗室已開發出一個maximum matching breakpoint distance (簡稱MBD) based scaffolding演算法,它可以利用一個參考基因體來對目標基因體草圖進行scaffolding。然而,breakpoint只考慮相鄰的兩個markers,結果造成當參考與目標基因體的親屬關係較遠時,MBD-based scaffolding演算法的resulting scaffolds會接得較不完整。因此,在本論文中,我們利用conserved interval的概念,定義出一個maximum matching conserved interval distance (簡稱MCID) based scaffolding問題,這個問題的目的是要去決定出目標與參考基因體的scaffolds,使得這兩個scaffolds之間的conserved interval distance為最小。我們使用整數線性規劃設計出一個精確演算法來解決MCID-based scaffolding問題。最後,根據模擬的實驗結果,我們的MCID-based scaffolding演算法在參考基因體是完整的情況下的靈敏度比MBD-based scaffolding演算法來得好。即便我們的MCID-based scaffolding演算法的準確度不如MBD-based scaffolding演算法,但在超過一半的參數組合下,我們的MCID-based scaffolding演算法在F-score的表現仍勝過MBD-based scaffolding演算法。

關鍵字

保留區間

並列摘要


Scaffolding is one of the important steps in the process of DNA sequencing. The purpose of scaffolding is to order and orient contigs in a draft genome. Previously, our laboratory has developed a maximum matching breakpoint distance (MBD for short) based scaffolding algorithm, to scaffold a target draft genome using a reference genome. However, a breakpoint only considers two adjacent markers, resulting in that the more dissimilar the reference and target genomes are, the less complete scaffolds the MBD-based scaffolding algorithm makes. In this thesis, therefore, we utilize a concept of conserved intervals to define a maximum matching conserved interval distance (MCID for short) based scaffolding problem, which is to determine the scaffolds of the target and reference genomes such that the conserved interval distance between the resulting scaffolds is minimized. In addition, we use integer linear programming (ILP) to design an exact algorithm to solve the MCID-based scaffolding problem. Finally, according to the experimental results on simulated datasets, the sensitivity of our MCID-based scaffolding algorithm is better than that of MBD-based one when the reference genome is complete. Although the precision of our MCID-based scaffolding algorithm is inferior to that of MBD-based one, our MCID-based scaffolding algorithm still prevails over the MBD-based one in terms of F-score in more than half of all parameter combinations.

並列關鍵字

scaffolding conserved interval

參考文獻


[1] S. Assefa, T.M. Keane, T.D. Otto, C. Newbold and M. Berriman (2009) ABACAS algorithm-based automatic contiguation of assembled sequences. Bioinformatics, 25, 1968–1969.
[2] M. Galardini, E.G. Biondi, M. Bazzicalupo and A. Mengoni (2011) CONTIGuator: a bacterial genomes finishing tool for structural insights on draft genomes. Source Code for Biology and Medicine, 6, 11.
[3] P. Husemann and J. Stoye (2010) r2cat: synteny plots and comparative assembly. Bioinformatics, 26, 570–571.
[4] D.C. Richter, S.C. Schuster and D.H. Huson (2007) OSLay: optimal syntenic layout of unfinished assemblies. Bioinformatics, 23, 1573–1579.
[5] A.I. Rissman, B. Mau, B.S. Biehl, A.E. Darling, J.D. Glasner and N.T. Perna (2009) Reordering contigs of draft genomes using the Mauve Aligner. Bioinformatics, 25, 2071–2073.

延伸閱讀