利用獨特性質解決字串比對問題和利用篩選方法解決近似字串比對問題

在這篇論文中，我們討論兩個問題: 字串比對和近似字串比對問題。字串比對問題是要找出一個字串P在另一個較長的字串T中所有出現的位置。我們首先指出一個字串P中會有所謂的唯一性質(uniqueness property)，利用這個性質我們設計了四個演算法來解決字串比對問題。在我們的實驗中，我們四個演算法中有三個在比對的效率上比現有兩個有名的演算法(KMP and Boyer-Moore algorithms)還要快。近似字串比對問題的定義為: 在一個字串T中，找出所有T的子字串，其和另一個字串P的編輯距離(edit distance)小於或等於一個容錯值k。我們設計了一個演算法，可以將一些T的位置去掉不考慮，去掉的位置表示所有T的子字串在這些位置開始的，其編輯距離都一定比容錯值k大。我們的演算法很容易實做，且實驗結果顯示，我們的演算法很有效率，特別是在自然語言上的搜尋。

關鍵字

字串比對；唯一性質；近似字串比對；編輯距離；排除方法

並列摘要

In this thesis, we consider two problems, exact string matching and approximate string matching problem. The exact string matching problem is to determine all of the locations of a pattern string P appearing in a text string T. We first point out a uniqueness property of a given pattern. We then propose four algorithms to solve the exact string matching problem based upon the uniqueness property. Experimental results showed that three of our algorithms are faster than the KMP and Boyer-Moore algorithms. The approximate string matching problem is defined as follows: Given a text string T, a pattern string P and an error bound k, find all substrings of T whose edit distances with P are smaller than or equal to k. We give a method to eliminate candidate locations in text T as there can be no substring S starting from those locations such that the edit distance between S and pattern P is smaller than or equal to k. Our method is simple to implement. Experimental results show that our method is effective, especially in the case when we perform natural language searching.

並列關鍵字

string matching ； uniqueness property ； approximate string matching ； edit distance ； elimination method

參考文獻

[AC91] Optimal canonization of all substrings of a string, Apostolico, A. and Crochemore, M., Information and Computation, Vol. 95, 1991, pp. 76-95.

Google Scholar

[AG86] The Boyer-Moore-Galil string searching strategies revisited, Apostolico, A. and Giancarlo, R., SIAM Journal on Computing, Vol. 15, 1986, pp. 98-105.

Google Scholar

[AKLLLR2000] Text Indexing and Dictionary Matching with One Error, Amir, A., Keselman, D., Landau, G. M., Lewenstein, M., Lewenstein, N. and Rodeh, M., Journal of Algorithms, Vol. 37, 2000, pp. 309-325.

Google Scholar

[BM77] A fast string searching algorithm, Boyer, R.S. and Moore, J.S., Communications of the ACM, Vol. 20, 1977, pp. 762-772.

Google Scholar

[C91] Correctness and efficiency of the pattern matching algorithms, Colussi, L., Information and Computation, Vol. 95, No. 2, 1991, pp. 225-251.

Google Scholar

被引用紀錄

曾佳輝（2011）。減量訓練對唾液免疫球蛋白A、睪固酮及皮質醇濃度之影響〔碩士論文，國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-1610201315252057

陳昱廷（2014）。風險圖像行動化與視覺化對企業風險管理的影響-以食品安全為例〔碩士論文，國立中正大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0033-2110201613595195

國際替代計量

利用獨特性質解決字串比對問題和利用篩選方法解決近似字串比對問題

全文下載

主題瀏覽