透過您的圖書館登入
IP:18.117.183.150
  • 學位論文

不必解碼的區段長度編碼字串比對問題演算法

Algorithms for Comparing Run-Length Encoded Strings without Decoding

指導教授 : 趙坤茂

摘要


在字串學的研究中,一個最新的趨勢結合了字串搜尋與資料壓縮這兩種概念,創造了所謂「壓縮字串搜尋問題」。在此研究課題中,最終的目標是希望能設計出不必解碼的字串比對演算法。此論文中所探討的壓縮方式稱為「區段長度編碼」,儘管這個編碼方式十分簡易,唯一正面的結果為計算區段長度編碼字串的「插入、刪除距離」,所需花費的時間為O(mn log mn),其中m和n分別代表輸入字串的區段數。此論文中,我們探討了包含如何比較兩區段長度編碼字串與尋找特徵片段等問題,所提出的演算法皆不必經過解碼步驟,其中,我們更提出了第一個不必經過解碼就能計算區段長度編碼字串「編輯距離」的演算法。論文中更對所討論的問題建立其時間複雜度的下界,分別由「三總和問題」或「排序成對和問題」當成其下界,這說明了所討論的問題具有Ω(mn)和Ω(mn logm)等猜測的時間複雜度下界。我們相信本論文的結果有助於設計出不必經過解碼的區段長度編碼字串的序列比對演算法。

關鍵字

區段長度 壓縮 字串搜尋 比對 迴文

並列摘要


A recent trend in stringology combines the two concepts of pattern matching and data compression, creating the so-called compressed pattern matching problem. The ultimate goal in this line of investigation is to design algorithms that can cope with encoded strings without resorting to any decoding step. The underlying compression scheme considered in this dissertation is called run-length encoding. Despite its simple coding nature, the only positive result before this work is the computation of indel-distance (dual of longest common subsequence) of two run-length encoded strings, achieving O(mnlog mn) time, where m and n are the number of runs of the input strings. Both comparing and identifying featured patterns in run-length encoded strings are explored in this dissertation. All the presented algorithms contain no decoding step, one of which is the firs-known algorithm that computes the Levenshtein distance of two run-length encoded strings without decoding. Several lower bounds are established in the dissertation, by reduction from either the 3sum problem or the problem of sorting pairwise-sums, implying O­mega(mn) and Omega­(mnlogm) conjectured time bounds, respectively. We believe that the work accomplished in this dissertation shed some light on solutions to aligning two run-length encoded strings without decoding.

並列關鍵字

run length compression pattern matching alignment palindrome

參考文獻


[1] Karl R. Abrahamson. Generalized string matching. SIAM J. Comput., 16(6):1039-1051, 1987.
[2] Alok Aggarwal and James K. Park. Notes on searching in multidimensional monotone arrays (preliminary version). In FOCS, pages 497-512, 1988.
[3] Amihood Amir and Gary Benson. Efficient two-dimensional compressed matching. In Data Compression Conference, pages 279-288, 1992.
[4] Amihood Amir, Gary Benson, and Martin Farach. Let sleeping files lie: Pattern matching in z-compressed ‾les. J. Comput. Syst. Sci., 52(2):299-307, 1996.
[5] Amihood Amir, Gad M. Landau, and Dina Sokol. Inplace run-length 2d compressed search. Theor. Comput. Sci., 290(3):1361-1383, 2003.

延伸閱讀