多重序列的最長共同子序列之啟發式演算法

在許多領域裡，例如：計算機科學以及生物資訊領域，多重序列中尋找最常共同子序列是個有趣而且重要的問題。由於此問題的重要性，許多針對兩個序列的最常共同子序列問題且擁有大約O(n^2)時間複雜度的演算法被提出。而當序列數量多於兩個時，此問題將轉變為一NP-hard問題。尋找多數量及長度較長的多重序列中的最常共同子序列之正確解答將會耗費大量時間且無法應用於實際，許多啟發式演算法也相繼被提出。此論文提出了兩個針對多重序列的最長共同子序列問題的啟發式演算法。兩個演算法皆具有O(|Σ|^2n(k + |Σ|))的時間複雜度，Σ是序列的字母集，而k及n分別為多重序列的數量以及長度。在實驗結果中，我們提出的演算法，在平均狀況下，找出的共同子序列長度比 The Best Next for Maximal Available algorithm的結果多出7.33%。另外在改變字母集大小的實驗當中，可以說明我們的演算法同樣也適合應用在更複雜的多重序列。

關鍵字

最常共同子序列；多重序列；啟發式演算法

並列摘要

Finding the Longest Common Subsequence of multiple sequences is an interesting and important problem in many areas, such as computer science and bioinformatics. Since its importance, some algorithms with about time complexity O(n^2) have been proposed for finding the LCS of two sequences, where n is the length of sequences. For k sequences(k > 2), this problem will be become to a NP-hard problem. Finding the exact LCS of large number and length of sequences takes very long time and thus will become impractical, many heuristic algorithms have been proposed. In this thesis, two heuristic algorithms is presented for finding the Longest Common Subsequence of Multiple Sequences. The time complexities of our algorithms are O(|Σ|^2n(k + |Σ|)), where Σ is the set of symbols, k and n are the number and the length of input sequences, respectively. In the experi- mental results, our algorithms finds the common subsequences whose lengths are about 7.33% more than the Best Next for Maximal Available algorithm in average for random sequences with uniform distribution [1]. In the different symbol sizes experimentat, the result shows that our methods are also suitable for complicated sequences.

並列關鍵字

Longest common subsequence ； Multiple Sequences ； Heuristic Algorithm

參考文獻

[3] B John Oommen and Richard KS Loke. Pattern recognition of strings with substi- tutions, insertions, deletions and generalized transpositions. Pattern Recognition, 30(5):789–800, 1997.

[4] Webb Miller and Eugene W Myers. A file comparison program. Software: Practice and Experience, 15(11):1025–1040, 1985.

[5] Patrick AV Hall and Geoff R Dowling. Approximate string matching. ACM com- puting surveys (CSUR), 12(4):381–402, 1980.

[6] Lasse Bergroth, Harri Hakonen, and Timo Raita. A survey of longest common sub- sequence algorithms. In String Processing and Information Retrieval, 2000. SPIRE 2000. Proceedings. Seventh International Symposium on, pages 39–48. IEEE, 2000.

[7] Daniel S Hirschberg. Algorithms for the longest common subsequence problem. Journal of the ACM (JACM), 24(4):664–675, 1977.

國際替代計量

多重序列的最長共同子序列之啟發式演算法

全文下載

主題瀏覽