特殊回文序列搜尋演算法

回文序列廣泛存在於基因序列中，重複回文也不例外。因此，在本篇論文中討論了兩個和找尋特殊回文相關的演算法。其一是尋找給定序列中的重複回文，意即找尋出現次數超過一次的回文子序列。其中，我們能對尋找序列的出現次數和長度作為篩選條件。其二是找尋k次不匹配回文。k次不匹配回文是指一個子字串經由k個字母的修改之後可以變成一個回文。這個可以應用在生物序列中有點突變的回文尋找上。兩個演算法都應用在後綴陣列和最長共同字首陣列之上，這也是本篇論文獨特之處，使演算法變的有效率，並且讓演算法簡單易懂。

關鍵字

回文；重複序列；後綴陣列；最長共同字首陣列

並列摘要

The palindrome sequence is widely found in genomic sequences, including palindromic repeats. In this thesis, we provide two algorithms, one for finding palindromic repeats and the other for finding k-mismatch palindromes. In finding palindromic repeats algorithm, we can find all palindromic substrings which appear at least twice with length and frequency constraint efficiently. In k-mismatch problem, we can find substrings which can become palindromes by k characters’ modification. Both algorithms are based on suffix array and longest common prefix array. This is a novel combination and makes the algorithms more efficient and easily to understand.

並列關鍵字

palindrome ； repeated sequence ； suffix array ； longest common prefix array

參考文獻

[1] M. Dumitran and F. Manea. Longest gapped repeats and palindromes. In International Symposium on Mathematical Foundations of Computer Science, pages 205–217. Springer, 2015.

Google Scholar

[2] J. Fischer. Optimal succinctness for range minimum queries. In Latin American Symposium on Theoretical Informatics, pages 158–169. Springer, 2010.

Google Scholar

[3] T. Kasai, G. Lee, H. Arimura, S. Arikawa, and K. Park. Linear-time longest-commonprefix computation in suffix arrays and its applications. In Annual Symposium on Combinatorial Pattern Matching, pages 181–192. Springer, 2001.

Google Scholar

[4] R. Kolpakov and G. Kucherov. Searching for gapped palindromes. Theoretical Computer Science, 410(51):5365–5373, 2009.

Google Scholar

[5] R. Kolpakov, M. Podolskiy, M. Posypkin, and N. Khrapov. Searching of gapped repeats and subrepetitions in a word. Journal of Discrete Algorithms, 46:1–15, 2017.

Google Scholar

國際替代計量

特殊回文序列搜尋演算法

查找全文

主題瀏覽