一般序列上基因組問題之研究

「基因群(conserved gene cluster)」的辨認是瞭解基因演化與預測基因功能的一個重要步驟。「基因組(gene team)」是其中一個用來捕捉基因群的基本生物學特徵的著名模型。找出兩條一般序列上基因組的問題是本論文的重點。He 和 Goldwasser 對於這個問題提出了一個需要 O(mn) 時間和 O(m + n) 空間的演算法，其中 m、n 分別是兩條序列的基因總數。這篇論文提出一個新的高效率演算法。假設 m 小於等於 n。令 C = ∑_α∈Σ o_1(α)o_2(α)，其中 Σ 表示不同基因的集合，o_1(α) 和 o_2(α) 分別是 α 在兩條序列中的出現次數。我們的新演算法需要 O(min{C lg n, mn}) 時間和 O(m + n) 空間的演算法。與 He 和 Goldwasser 的演算法比較，我們的演算法更加實用，因為實際應用中 C 比 mn 還要小得多。此外，我們的演算法是 output sensitive。其執行時間是輸出大小乘上 O(lg n)。而且，我們的演算法可以很容易地推廣到 k 條序列上，時間是 O(k C lg (n_1 n_2 ... n_k))，其中 n_i 是第 i 條序列上基因的數目。

關鍵字

演算法；資料結構；基因組；基因序列比對；基因群

並列摘要

Identifying conserved gene clusters is an important step toward understanding the evolution of genomes and predicting the functions of genes. A famous model to capture the essential biological features of a conserved gene cluster is called the gene-team model. The problem of finding the gene teams of two general sequences is the focus of this dissertation. For this problem, He and Goldwasser had an efficient algorithm that requires O(mn) time using O(m + n) working space, where m and n are, respectively, the numbers of genes in the two given sequences. In this dissertation, a new efficient algorithm is presented. Assume m ≤ n. Let C = ∑_α∈Σ o_1(α)o_2(α), where α is the set of distinct genes, and o_1(α) and o_2(α) are, respectively, the numbers of copies of α in the two given sequences. Our new algorithm requires O(min{C lg n, mn}) time using O(m + n) working space. As compared with He and Goldwasser's algorithm, our new algorithm is more practical, as C is likely to be much smaller than mn in practice. In addition, our new algorithm is output sensitive. Its running time is O(lg n) times the size of the output. Moreover, our new algorithm can be efficiently extended to find the gene teams of k general sequences in O(k C lg (n_1 n_2 ... n_k)) time, where n_i is the number of genes in the ith input sequence.

並列關鍵字

algorithms ； data structures ； gene teams ； comparative genomics ； conserved gene clusters

參考文獻

[2] M. P. Béal, A. Bergeron, S. Corteel, and M. Raffinot, "An algorithmic view of gene teams," Theoretical Computer Science, vol. 320, no. 2-3, pp. 395-418, 2004.

[4] A. Bergeron, and J. Stoye, "On the similarity of sets of permutations and its applications to genome comparison," Journal of Computational Biology, vol. 13, pp. 1340-1354, 2006.

[5] G. Blin and J. Stoye, "Finding Nested common intervals efficiently," Journal of Computational Biology, vol. 17, no. 9, pp. 1183-1194, 2010.

[6] T. Dandekar, B. Snel, M. Huynen, and P. Bork. "Conservation of gene order: a fingerprint for proteins that physically interact," Trends in Biochemical Sciences, vol. 23, pp. 324-328, 1998.

[7] G. Didier, "Common intervals of two sequences," Lecture Notes in Computer Science, vol. 2812, pp. 17-24, 2003.

國際替代計量

一般序列上基因組問題之研究

主題瀏覽