使用重疊基因建構原核生物的基因體樹

重疊基因被定義為在染色體位置相鄰且編碼序列內容會部分或全部重疊的兩個基因。事實上，重疊基因在微生物的基因體上是非常普遍的，而且他們比非重疊基因在演化上是更具有保留性。基於上述的特性，我們之前已發展出一個網路伺服器的工具稱為OGtree，其可以讓使用者根據兩兩原核生物基因體間的重疊基因距離來建構原核生物的基因體樹。類似於基因內容與基因次序的研究，我們結合重疊基因內容(即兩個基因體之間共有的直向同源重疊基因對的平均數)與次序(即兩個基因體之間平均的重疊基因斷點距離)定義出兩個基因體之間的重疊基因距離。但在利用斷點距離來定義重疊基因距離時有一個缺點，即無法將其應用在多染色體的基因體並計算出他們的重疊基因距離。除此之外，對於某些親緣關係較遠的物種，在他們之間能夠找到的直向同源重疊基因可能很少，以致於沒有足夠的直向同源重疊基因可適當地衡量出他們兩兩之間的重疊基因距離。因此，在這篇論文中，我們定義了一個新的重疊基因距離，它是根據較有生物正確性的基因重組(例如：翻轉、移位與易位)而不是斷點所定義出來的，而且它能同時應用在單一染色體與多染色體的基因體上。除此之外，我們也擴展了基因的範圍使之同時包含其編碼序列與調控區，如此我們可以將兩個鄰近基因發生編碼序列重疊或調控區重疊都視是一對重疊基因。這是因為不同基因若在調控區域發生重疊現象，或多或少會影響這些基因的調控。根據上述的改變，我們將OGtree改版為一個新的網路伺服器叫做OGtree2.0，並且利用二十一條蛋白細菌染色體去建構其演化樹並用其結果來衡量OGtree2.0的正確性。最後，我們的實驗結果顯示OGtree2.0的確比之前的版本OGtree以及另一個相似的工具BPhyOG要來得好，因為OGtree2.0所建構出的演化樹，其蛋白細菌之間的親緣關係與被生物學家所接受的是參考樹一致的。

關鍵字

生物資訊；演算法；基因體樹；重疊基因；原核生物；基因體重組

並列摘要

Overlapping genes (OGs) are defined as adjacent genes whose coding sequences overlap partially or entirely. In fact, they are ubiquitous in microbial genomes and more conserved between species than non-overlapping genes. Based on this property, we have previously implemented a web server, named OGtree, that allows the user to reconstruct genome trees of some prokaryotes according to their pairwise OG distances. By analogy to the analyses of gene content and gene order, the OG distance between two genomes we defined was based on a measure of combining OG content (i.e., the normalized number of shared orthologous OG pairs) and OG order (i.e. the normalized OG breakpoint distance) in their whole genomes. A shortcoming of using the concept of breakpoints to define the OG distance is its inability to analyze the OG distance of multi-chromosomal genomes. In addition, the amount of orthologous overlapping coding sequences between some distantly related prokaryotic genomes may be limited so that it is hard to find enough orthologous OGs to properly evaluate their pairwise OG distances. In this study, we therefore define a new OG order distance that is based on more biologically accurate rearrangements (e.g., reversals, transpositions and translocations) rather than breakpoints and that is applicable to both uni-chromosomal and multi-chromosomal genomes. In addition, we expand the term ”gene” to include both its coding sequence and regulatory regions so that two adjacent genes whose coding sequences or regulatory regions overlap with each other are considered as a pair of overlapping genes. This is because overlapping of regulatory regions of distinct genes suggests that the regulation of expression for these genes should be more or less interrelated. Based on these modifications, we have reimplemented our OGtree as a new web server OGtree2.0 and have also evaluated its accuracy of genome tree reconstruction on a testing dataset consisting of 21 Proteobacteria genomes. Our experimental results have finally shown that our current OGtree2.0 indeed outperforms its previous version OGtree, as well as another similar server BPhyOG, significantly in the quality of genome tree reconstruction, because the phylogenetic tree obtained by OGtree2.0 is greatly congruent with the reference tree that coincides with the taxonomy accepted by biologists for these Proteobacteria.

並列關鍵字

bioinformatics ； algorithm ； genome tree ； overlapping gene ； prokaryote ； genome rearrangement

參考文獻

[1] Snel B, Bork P, Huynen MA: Genome phylogeny based on gene content. Nature Genetics 1999, 21:108–110.

[2] Snel B, Huynen MA, Dutilh BE: Genome trees and the nature of genome evolution. Annual Review of Microbiology 2005, 59:191–209.

[3] Blanchette M, Kunisawa T, Sankoff D: Gene order breakpoint evidence in animal mitochondrial phylogeny. Journal of Molecular Evolution 1999, 49:193–203.

[4] Sankoff D: Genome rearrangement with gene families. Bioinformatics 1999, 15:909–917.

[5] Belda E, Moya A, Silva FJ: Genome rearrangement distances and gene order phylogeny in γ-Proteobacteria. Molecular Biology and Evolution 2005, 22:1456–1467.

國際替代計量

使用重疊基因建構原核生物的基因體樹

全文下載

主題瀏覽