這篇碩論主要研究的方向為字典匹配問題,我總共研究了三個不同的變形。 第一個變形是所有字典裡的字串都可以有一個gap,這個問題Amir他們在2014年提出了一個解法,而我換個方向改變了他們儲存資料的方式,改進了他們的做法,使得在同樣限制下我們的空間與時間複雜度都更小。而且我們的做法還可以將這個問題擴展到更加一般性的問題。 第二個變形跟第一個問題很像,這問題字典裡的所有字串都可以消失其中一段,但我們不知道是哪一段消失了,然後希望可以讀取一個文本text,找到所有可能的字串匹配的位置。這個問題是一個全新的問題,目前還沒有人討論過這個問題。我們透過在failure tree上做樹鏈剖分,提出了一個完整可行的解法。 第三個問題跟前兩個問題都不一樣,字典裡的字串跟一般的字串是一樣的,沒有gap也不能消失其中一段,但我們把字母做一對一的映射後,也可以視作是同樣的字串來做匹配,比如說aab跟xxy可以視作是一樣的兩個字串。我們提出了一個新的方式,修改了前人的做法,降低了以前算法的空間複雜度。
Dictionary matching is a well-studied problem in computer sci- ence, which have found numerous applications, such as computer virus detection and bioinformatics analysis. In this thesis, we study three variants of the dictionary matching problem. The first variant, called dictionary matching with gapped patterns, is recently proposed by Amir et al. in which a pattern may be matched with a substring of a query text T with a gap of bounded length present in the pattern. We first give an alter- native linear-space solution to Amir et al.’s problem, where gap lengths of all patterns have the same lower bound α and upper bound β. The query time on any query text T is bounded by O((β − α + 1)|T | log d + occ), where d denotes the number of patterns, and occ denotes the size of the output. After that, we show that the framework can be generalized to handle the case where gaps may have different bounds, thereby answering one of the open problems raised by Amir et al. The second variant, called dictionary matching with one missing substring, is a new problem in which a gap of bounded length may be present in the text substring when it is being matched. We show that this prob- lem can be solved by using a similar framework. Furthermore, by applying a novel indexing technique on the failure tree, we obtain a space-time tradeoff result, which will be suitable when the dictionary contains only short patterns, or when index space is a critical concern. The third variant is called parameterized dictionary matching. Idury et al. proposed a linear-space so- lution for this problem using Baker’s encoding method. Here, we come up with a new encoding method, which allows us to achieve a better space complexity for the index, with only slight slowdown in the query time.
為了持續優化網站功能與使用者體驗,本網站將Cookies分析技術用於網站營運、分析和個人化服務之目的。
若您繼續瀏覽本網站,即表示您同意本網站使用Cookies。