透過您的圖書館登入
IP:18.118.1.158
  • 學位論文

一般序列上基因組問題之研究

A Study on the Gene-Team Problem on General Sequences

指導教授 : 王炳豐
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


「基因群(conserved gene cluster)」的辨認是瞭解基因演化與預測基因功能的一個重要步驟。「基因組(gene team)」是其中一個用來捕捉基因群的基本生物學特徵的著名模型。找出兩條一般序列上基因組的問題是本論文的重點。He 和 Goldwasser 對於這個問題提出了一個需要 O(mn) 時間和 O(m + n) 空間的演算法,其中 m、n 分別是兩條序列的基因總數。這篇論文提出一個新的高效率演算法。假設 m 小於等於 n。令 C = ∑_α∈Σ o_1(α)o_2(α),其中 Σ 表示不同基因的集合,o_1(α) 和 o_2(α) 分別是 α 在兩條序列中的出現次數。我們的新演算法需要 O(min{C lg n, mn}) 時間和 O(m + n) 空間的演算法。與 He 和 Goldwasser 的演算法比較,我們的演算法更加實用,因為實際應用中 C 比 mn 還要小得多。此外,我們的演算法是 output sensitive。其執行時間是輸出大小乘上 O(lg n)。而且,我們的演算法可以很容易地推廣到 k 條序列上,時間是 O(k C lg (n_1 n_2 ... n_k)),其中 n_i 是第 i 條序列上基因的數目。

並列摘要


Identifying conserved gene clusters is an important step toward understanding the evolution of genomes and predicting the functions of genes. A famous model to capture the essential biological features of a conserved gene cluster is called the gene-team model. The problem of finding the gene teams of two general sequences is the focus of this dissertation. For this problem, He and Goldwasser had an efficient algorithm that requires O(mn) time using O(m + n) working space, where m and n are, respectively, the numbers of genes in the two given sequences. In this dissertation, a new efficient algorithm is presented. Assume m ≤ n. Let C = ∑_α∈Σ o_1(α)o_2(α), where α is the set of distinct genes, and o_1(α) and o_2(α) are, respectively, the numbers of copies of α in the two given sequences. Our new algorithm requires O(min{C lg n, mn}) time using O(m + n) working space. As compared with He and Goldwasser's algorithm, our new algorithm is more practical, as C is likely to be much smaller than mn in practice. In addition, our new algorithm is output sensitive. Its running time is O(lg n) times the size of the output. Moreover, our new algorithm can be efficiently extended to find the gene teams of k general sequences in O(k C lg (n_1 n_2 ... n_k)) time, where n_i is the number of genes in the ith input sequence.

參考文獻


[2] M. P. Béal, A. Bergeron, S. Corteel, and M. Raffinot, "An algorithmic view of gene teams," Theoretical Computer Science, vol. 320, no. 2-3, pp. 395-418, 2004.
[4] A. Bergeron, and J. Stoye, "On the similarity of sets of permutations and its applications to genome comparison," Journal of Computational Biology, vol. 13, pp. 1340-1354, 2006.
[5] G. Blin and J. Stoye, "Finding Nested common intervals efficiently," Journal of Computational Biology, vol. 17, no. 9, pp. 1183-1194, 2010.
[6] T. Dandekar, B. Snel, M. Huynen, and P. Bork. "Conservation of gene order: a fingerprint for proteins that physically interact," Trends in Biochemical Sciences, vol. 23, pp. 324-328, 1998.
[7] G. Didier, "Common intervals of two sequences," Lecture Notes in Computer Science, vol. 2812, pp. 17-24, 2003.

延伸閱讀