透過您的圖書館登入
IP:18.217.158.184
  • 學位論文

Dynamic compressed index for approximate string matching

空間壓縮下搜尋近似字串的動態索引

指導教授 : 韓永楷

摘要


Abstract Let L = {T1, T2, ..., Tl} be a set of texts whose total length is n and P = P0P2...Pm−1 be a pattern of length m, both over a fixed alphabet A. The ap- proximate library management problem is to report all occurrences of P by allowing some degree of errors. Besides a text may be inserted into or deleted from L from time to time. This thesis introduces the approach to solve the approximate library management problem which did not be concerned before. Prior to this thesis, the most compact result that the index of Chan et al. [3] with the algorithm of Trinh et al. [11] is based on compressed suffix array that is so complex to implement and it requires O(m|A| log3 n+occ log3 n) query time. In this thesis I use the indexes based on suffix sampling which is simple and easy to implement and the query time is O(m|A| log3 n+occ log2 n) using O(n log |A|) bits where occ is the number of occurrences. Also the indexes in this thesis sup- ports text insertion and deletion with update time O(|T| log |A| log n + log2 n).

關鍵字

動態 空間壓縮索引 近似

並列摘要


中文摘要 讓L = {T1, T2, ..., Tl} 為一字串的集合, 其總長度為n, 以及P = P0P2...Pm−1為一長 度為m的樣本,且L 和P都建立在一固定的字母表A上。Approximate library man- agement problem 是在容許某種程度的錯誤下, 找出P發生在L中的位置。本論文提 出此以前無人關注的動態問題, 即一字串可以被加入L中或從L中刪除, 並提供其解決 方法。之前跟本論文最相近的結果為Chan et al. [3] 所提出的索引套用Trinh et al. [11]所提出的演算法, 此解法是基於compressed suffix array 以至於較複雜而難以實 作, 其搜尋時間為O(m|A| log3 n + occ log3 n)。本論文所使用的索引是建立在一新的 技巧suffix sampling 上, 簡單且容易實作, 其搜尋時間為O(m|A| log3 n+occ log2 n) 使用O(n log |A|)bits 的空間, 其中occ 為此樣本出現的次數。此索引支援字串的加入 與刪除, 其更新時間為O(|T| log |A| log n + log2 n)。

並列關鍵字

dynamic compressed index approximate

參考文獻


Tsing Hua University, Taiwan, 2009.
[1] A. Amir, M. Lewenstein, and E. Porat. Faster algorithms for string match-
ing with k mismatches. In Proceedings of the 11th Proceedings of Symposium
on Discrete Algorithms, pages 794–803, 2000.
[2] R. S. Boyer and J. S. Moore. A Fast String Searching Algorithm. Commu-

延伸閱讀