在過去以 Apriori 為基礎的循序樣式探勘方法中,皆單純採用由下而上的方式從長度較短的頻繁樣式成長至較長的候選樣式,一直成長至沒有更長的候選樣式產生為止,因而若需探勘最長頻繁樣式時則必須將所有的頻繁樣式探勘完才可得到(所謂的最長頻繁樣式就是不屬於其它任一頻繁樣式的子樣式稱之),造成許多不必要的運算。 本研究提出並建構一個優先探勘最長頻繁樣式之演算法PMSPM,結合由下而上以及由上而下兩種方向的探勘方式,進而早一步於由上而下的方向找出大多數的最長頻繁樣式,使得在由下而上方向的探勘不需成長到最後的階段,即可找出所有頻繁且最長的樣式,進而節省大量不需要的候選樣式測試,以減少探勘時間。本研究除了證明演算法之正確性與完整性,並以模擬實驗驗證其效能及可行性,最後,將PMSPM演算法應用於糖尿病病患之病歷資料探勘,找出病患之血糖的最長樣式演進型式,以供推測血糖樣式變化之原因。
Apriori-based sequential pattern mining algorithms use bottom-up method. They merge frequent patterns with shorter length into candidate patterns with longer length, then repeat the process until no more candidate patterns could be generated. However, if we want to obtain frequent maximal sequential patterns, which are not a sub-sequence of any other frequent sequential pattern, these algorithms will have to generate all sequential patterns first, which require lots of computation unrelated to the final results. For this reason, we propose an algorithm, PMSPM, to find all frequent maximal sequential patterns by cutting most of the intermediate steps. Like pincers, we alternate bottom-up and top-down methods to find most of the maximal sequential patterns in the early top-down stage. Thus, the mining in the bottom-up direction could skip repetitive procedures before the last pass. Therefore, the algorithm could reduce needless support count tests to save time. We demonstrate the correctness and integrity of PMSPM. We test its performance through experimenting with synthetic databases. Finally, we apply our algorithm to mining anamnesis records of diabetes patients, and find out frequent glucose evolution patterns.