Solving Genome Rearrangement Problems Using Permutation Groups

With the growing availability of complete genome sequences, genome rearrangement studies based on genome-wide analysis of gene orders play an important role in phylogenetic tree reconstruction. In contrast to traditional alignment approaches for detecting point mutations (e.g., substitutions, insertions and deletions of nucleotides/amino acids), genome rearrangements are based on comparison of gene orders to detect large-scale mutations, such as reversals, transpositions, block-interchanges (also called generalized transpositions), fusions, fissions and translocations. Given two gene orders of genomes with the same set of genes, the genome rearrangement problem aims to compute a minimum sequence of rearrangement operations required to transform one genome into the other. The genome rearrangement problem can also be viewed as a problem of sorting a permutation, if the given genomes are represented by permutations with one having positive, sorted order. In this thesis, by using permutation groups in algebra, we first present an O(n + δlogδ) time algorithm for solving the problem of sorting by block-interchanges, where n is the number of genes and is the minimum number of rearrangement operations required to sort a genome. We then present an O(δn) time algorithm for the problem of sorting by reversals and block-interchanges with a weight proportion 1:2. In addition, we further consider additional translocations (including fusions and fissions), which are weighted 1, when dealing with multi-chromosomal genomes and consequently propose the O(δn) time algorithms for the problem with linear and circular chromosomal genomes, respectively. Based on the algorithms mentioned above, we have finally implemented a web server that allows biologists to perform genome rearrangement analysis involving reversals, block-interchanges and translocations (including fusions and fissions), and also infer phylogenetic trees of genomes being considered based on their pairwise genome rearrangement distances. In this web server, we also provide biologists to perform the so-called jackknife analysis to evaluate statistical reliability of the constructed phylogenetic trees.

關鍵字

基因體重組；排列群；排序；代數；反轉；區塊互換；融合；分裂；易位

並列摘要

隨著愈來愈多完整的基因體序列被定序出來，分析基因體上基因次序的基因體重組研究在演化樹的建構上扮演著重要的角色。不同於偵測點突變(例如核苷酸或胺基酸的取代、插入及刪除)的傳統對齊方法，基因體重組利用基因次序的比較去偵側大規模的突變，像是反轉(reversals)、移位(transpositions)、區塊互換(block-interchanges)、融合(fusions)、分裂(fissions)和易位(translocations)。已知二個基因體上共有基因的基因次序，基因體重組問題的目的是要去計算出一個最少的基因體重組序列把其中一個基因體的基因次序轉換成另一個基因體的基因次序。如果其中一個已知基因體的基因次序被表示成一個已排序好的正整數序列的話，那麼基因體重組的問題便可以被視為一種排序(sorting)的問題。在本論文中，我們利用代數的排列群首先提出一個時間複雜度為Ο(n +δlogδ)的演算法來解決區塊互換的排序問題(sorting by block-interchanges)，其中n是基因的個數，δ是把基因體給排序好所需的最少基因體重組個數。我們接著提出一個時間複雜度為Ο(δn )的演算法來解決反轉及區塊互換的排序問題(sorting by reversals and block-interchanges)，其中反轉及區塊互換的權重比為1:2。除此之外，當處理多條染色體的基因體時，我們進一步地考慮額外的易位(包括融合及分裂)，將其權重設為1，並提出時間複雜度皆為Ο(δn )的兩個演算法分別排序線狀與環狀多條染色體的基因次序。最後，我們把上述的演算法實作成一個軟體工具可讓生物學家們透過網際網路來使用，此軟體工具可允許生物學家們進行含有反轉、區塊互換與易位(包括融合及分裂)的基因體重組分析，以及根據分析出來的兩兩基因體之間的重組距離來推測出基因體的種族樹。在這個軟體工具中，我們也提供生物學家們進行所謂的刀切分析法(jackknife)可以用來評估所建構種族樹的統計可信度。

並列關鍵字

genome rearrangement ； permutation groups ； sorting ； algebra ； reversal ； block-interchange ； fusion ； fission ； translocation

參考文獻

[1] Adam, Z. and Sankoff, D. 2008. The ABCs of MGR with DCJ. Evolutionary Bioinformatics,

[2] Alekseyev, M. A. 2008. Multi-break rearrangements and breakpoint re-uses: from circular

[3] Alekseyev, M. A. and Pevzner, P. A. 2008. Multi-break rearrangements and chromosomal

evolution. Theoretical Computer Science, 395, 193–202.

inversion distance between signed permutations with an experimental study. Journal of

被引用紀錄

徐祥豐（2003）。一個用於MPEG-4視訊壓縮編碼之視訊物件分割法的評估準則〔碩士論文，中原大學〕。華藝線上圖書館。https://doi.org/10.6840/cycu200300471

國際替代計量

Solving Genome Rearrangement Problems Using Permutation Groups

主題瀏覽