在這篇論文中,我們討論兩個問題: 字串比對和近似字串比對問題。 字串比對問題是要找出一個字串P在另一個較長的字串T中所有出現的位置。 我們首先指出一個字串P中會有所謂的唯一性質(uniqueness property),利用這個性質我們設計了四個演算法來解決字串比對問題。 在我們的實驗中,我們四個演算法中有三個在比對的效率上比現有兩個有名的演算法(KMP and Boyer-Moore algorithms)還要快。 近似字串比對問題的定義為: 在一個字串T中,找出所有T的子字串,其和另一個字串P的編輯距離(edit distance)小於或等於一個容錯值k。 我們設計了一個演算法,可以將一些T的位置去掉不考慮,去掉的位置表示所有T的子字串在這些位置開始的,其編輯距離都一定比容錯值k大。 我們的演算法很容易實做,且實驗結果顯示,我們的演算法很有效率,特別是在自然語言上的搜尋。
In this thesis, we consider two problems, exact string matching and approximate string matching problem. The exact string matching problem is to determine all of the locations of a pattern string P appearing in a text string T. We first point out a uniqueness property of a given pattern. We then propose four algorithms to solve the exact string matching problem based upon the uniqueness property. Experimental results showed that three of our algorithms are faster than the KMP and Boyer-Moore algorithms. The approximate string matching problem is defined as follows: Given a text string T, a pattern string P and an error bound k, find all substrings of T whose edit distances with P are smaller than or equal to k. We give a method to eliminate candidate locations in text T as there can be no substring S starting from those locations such that the edit distance between S and pattern P is smaller than or equal to k. Our method is simple to implement. Experimental results show that our method is effective, especially in the case when we perform natural language searching.