With the growing availability of complete genome sequences, genome rearrangement studies based on genome-wide analysis of gene orders play an important role in phylogenetic tree reconstruction. In contrast to traditional alignment approaches for detecting point mutations (e.g., substitutions, insertions and deletions of nucleotides/amino acids), genome rearrangements are based on comparison of gene orders to detect large-scale mutations, such as reversals, transpositions, block-interchanges (also called generalized transpositions), fusions, fissions and translocations. Given two gene orders of genomes with the same set of genes, the genome rearrangement problem aims to compute a minimum sequence of rearrangement operations required to transform one genome into the other. The genome rearrangement problem can also be viewed as a problem of sorting a permutation, if the given genomes are represented by permutations with one having positive, sorted order. In this thesis, by using permutation groups in algebra, we first present an O(n + δlogδ) time algorithm for solving the problem of sorting by block-interchanges, where n is the number of genes and is the minimum number of rearrangement operations required to sort a genome. We then present an O(δn) time algorithm for the problem of sorting by reversals and block-interchanges with a weight proportion 1:2. In addition, we further consider additional translocations (including fusions and fissions), which are weighted 1, when dealing with multi-chromosomal genomes and consequently propose the O(δn) time algorithms for the problem with linear and circular chromosomal genomes, respectively. Based on the algorithms mentioned above, we have finally implemented a web server that allows biologists to perform genome rearrangement analysis involving reversals, block-interchanges and translocations (including fusions and fissions), and also infer phylogenetic trees of genomes being considered based on their pairwise genome rearrangement distances. In this web server, we also provide biologists to perform the so-called jackknife analysis to evaluate statistical reliability of the constructed phylogenetic trees.
隨著愈來愈多完整的基因體序列被定序出來,分析基因體上基因次序的基因體重組研究在演化樹的建構上扮演著重要的角色。不同於偵測點突變(例如核苷酸或胺基酸的取代、插入及刪除)的傳統對齊方法,基因體重組利用基因次序的比較去偵側大規模的突變,像是反轉(reversals)、移位(transpositions)、區塊互換(block-interchanges)、融合(fusions)、分裂(fissions)和易位(translocations)。已知二個基因體上共有基因的基因次序,基因體重組問題的目的是要去計算出一個最少的基因體重組序列把其中一個基因體的基因次序轉換成另一個基因體的基因次序。如果其中一個已知基因體的基因次序被表示成一個已排序好的正整數序列的話,那麼基因體重組的問題便可以被視為一種排序(sorting)的問題。在本論文中,我們利用代數的排列群首先提出一個時間複雜度為Ο(n +δlogδ)的演算法來解決區塊互換的排序問題(sorting by block-interchanges),其中n是基因的個數,δ是把基因體給排序好所需的最少基因體重組個數。我們接著提出一個時間複雜度為Ο(δn )的演算法來解決反轉及區塊互換的排序問題(sorting by reversals and block-interchanges),其中反轉及區塊互換的權重比為1:2。除此之外,當處理多條染色體的基因體時,我們進一步地考慮額外的易位(包括融合及分裂),將其權重設為1,並提出時間複雜度皆為Ο(δn )的兩個演算法分別排序線狀與環狀多條染色體的基因次序。最後,我們把上述的演算法實作成一個軟體工具可讓生物學家們透過網際網路來使用,此軟體工具可允許生物學家們進行含有反轉、區塊互換與易位(包括融合及分裂)的基因體重組分析,以及根據分析出來的兩兩基因體之間的重組距離來推測出基因體的種族樹。在這個軟體工具中,我們也提供生物學家們進行所謂的刀切分析法(jackknife)可以用來評估所建構種族樹的統計可信度。