透過您的圖書館登入
IP:3.137.136.226
  • 學位論文

MIHSPM:一個多項目集的混合循序樣式探勘演算法

MIHSPM:A Multiple Itemset Hybrid Sequential Pattern Mining Algorithm

指導教授 : 周清江

摘要


在資料探勘研究中,循序樣式探勘為重要的探勘問題,目的是從原始資料發生時間作為資料前後順序之依據,找到出現次數超過使用者設定門檻值之頻繁循序樣式。依據循序樣式中,相鄰項目於各個交易之間是否為相鄰情況下,可分為連續、非連續和混合循序樣式三種,根據過去混合循序樣式探勘研究,我們發現探勘其樣式的交易內容都是以一個項目為主,而缺少多項目集的研究。我們提出一個新的演算法MIHSPM(Multiple Itemset Hybrid Sequential Pattern Mining Algorithm),找出多項目集的混合循序樣式。 MIHSPM演算法有以下四個步驟:1. 掃描資料庫,產生一階頻繁樣式。2. 建立一階頻繁樣式的順序表格。3. 進行樣式順序表格合併,產生二階頻繁樣式。4. 以前序分割樣式順序表格,找出全部的混合循序樣式。 在實驗的部分,我們以模擬資料庫測試其演算法的效能及分析相關模擬參數之敏感度。最後,我們將MIHSPM演算法應用於糖尿病患之病歷資料探勘,病患之血糖的樣式演進型式,以供推測血糖樣式變化之原因。

關鍵字

循序樣式 資料探勘

並列摘要


Sequential pattern mining is an important research topic at data mining. Its main purpose is to find out serial patterns according to orders in occurrence time with frequency exceeding user defined threshold. Based on whether consecutive items in sequential patterns should also be consecutive in the transactions, it could be classified into the following three categories: The first is continuous patterns; the second is discontinuous patterns; the third is hybrid patterns that combine both continuous patterns and discontinuous patterns. Transaction contents of sequential patterns in previous researches are for a single item. We propose a new algorithm, MIHSPM, to find multiple item set hybrid sequential patterns. The four steps of MISHPM are as follows: 1. Scan database to generate frequent length-1 patterns. 2. Build pattern order table of frequent length-1 patterns. 3. Join all pattern order tables to generate frequent length-2 patterns. 4. Partition pattern order tables according pattern's prefixes and recursively join pattern order tables. We use synthetic databases to test our algorithm's performance and analyze the sensitivity of related parameter. Finally, we apply our algorithm to mining anamnesis records of diabetes patients, and find out frequent glucose evolution patterns.

並列關鍵字

sequential pattern data mining

參考文獻


[4]周清江、原孝任,CHSPM:一個完整的混合循序樣式探勘演算法,淡江大學資訊管理研究所碩士論文,2005年。
[2]沈清正、陳仕昇、高鴻斌、張元哲、陳家仁、黃琮盛、陳彥良,資料間隱含關係的挖掘與展望,中央大學資訊管理研究所資料挖掘課程專書。
[3]林柏伸,行動環境下之使用者行為樣式研究-以二維度序列型樣進行探勘,中原大學資訊管理研究所碩士論文,2004年。
[9]R. Agrawal and R. Srikant, “Fast algorithm for mining association rules”, In proceeding of the 20th international conference on VLDB, Santiago, 1994, 487-499.
[11]R. Agrawal and R. Srikant, “Mining Sequential Patterns: Generalizations and Performance Improvements”, In proceeding of the 15th international conference on Extending Database Technology, 1996, 3-17.

被引用紀錄


吳鎮成(2008)。PMSPM:夾擊最長循序樣式探勘演算法〔碩士論文,淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2008.00592
顏志祐(2008)。一個能發掘更具意義循序樣式的探勘流程〔碩士論文,淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2008.00374

延伸閱讀