在資料探勘研究中,循序樣式探勘為重要的探勘問題,目的是從原始資料發生時間作為資料前後順序之依據,找到出現次數超過使用者設定門檻值之頻繁循序樣式。依據循序樣式中,相鄰項目於各個交易之間是否為相鄰情況下,可分為連續、非連續和混合循序樣式三種,根據過去混合循序樣式探勘研究,我們發現探勘其樣式的交易內容都是以一個項目為主,而缺少多項目集的研究。我們提出一個新的演算法MIHSPM(Multiple Itemset Hybrid Sequential Pattern Mining Algorithm),找出多項目集的混合循序樣式。 MIHSPM演算法有以下四個步驟:1. 掃描資料庫,產生一階頻繁樣式。2. 建立一階頻繁樣式的順序表格。3. 進行樣式順序表格合併,產生二階頻繁樣式。4. 以前序分割樣式順序表格,找出全部的混合循序樣式。 在實驗的部分,我們以模擬資料庫測試其演算法的效能及分析相關模擬參數之敏感度。最後,我們將MIHSPM演算法應用於糖尿病患之病歷資料探勘,病患之血糖的樣式演進型式,以供推測血糖樣式變化之原因。
Sequential pattern mining is an important research topic at data mining. Its main purpose is to find out serial patterns according to orders in occurrence time with frequency exceeding user defined threshold. Based on whether consecutive items in sequential patterns should also be consecutive in the transactions, it could be classified into the following three categories: The first is continuous patterns; the second is discontinuous patterns; the third is hybrid patterns that combine both continuous patterns and discontinuous patterns. Transaction contents of sequential patterns in previous researches are for a single item. We propose a new algorithm, MIHSPM, to find multiple item set hybrid sequential patterns. The four steps of MISHPM are as follows: 1. Scan database to generate frequent length-1 patterns. 2. Build pattern order table of frequent length-1 patterns. 3. Join all pattern order tables to generate frequent length-2 patterns. 4. Partition pattern order tables according pattern's prefixes and recursively join pattern order tables. We use synthetic databases to test our algorithm's performance and analyze the sensitivity of related parameter. Finally, we apply our algorithm to mining anamnesis records of diabetes patients, and find out frequent glucose evolution patterns.