PMSPM:夾擊最長循序樣式探勘演算法

在過去以 Apriori 為基礎的循序樣式探勘方法中，皆單純採用由下而上的方式從長度較短的頻繁樣式成長至較長的候選樣式，一直成長至沒有更長的候選樣式產生為止，因而若需探勘最長頻繁樣式時則必須將所有的頻繁樣式探勘完才可得到(所謂的最長頻繁樣式就是不屬於其它任一頻繁樣式的子樣式稱之)，造成許多不必要的運算。本研究提出並建構一個優先探勘最長頻繁樣式之演算法PMSPM，結合由下而上以及由上而下兩種方向的探勘方式，進而早一步於由上而下的方向找出大多數的最長頻繁樣式，使得在由下而上方向的探勘不需成長到最後的階段，即可找出所有頻繁且最長的樣式，進而節省大量不需要的候選樣式測試，以減少探勘時間。本研究除了證明演算法之正確性與完整性，並以模擬實驗驗證其效能及可行性，最後，將PMSPM演算法應用於糖尿病病患之病歷資料探勘，找出病患之血糖的最長樣式演進型式，以供推測血糖樣式變化之原因。

關鍵字

循序樣式；資料探勘；最長循序樣式

並列摘要

Apriori-based sequential pattern mining algorithms use bottom-up method. They merge frequent patterns with shorter length into candidate patterns with longer length, then repeat the process until no more candidate patterns could be generated. However, if we want to obtain frequent maximal sequential patterns, which are not a sub-sequence of any other frequent sequential pattern, these algorithms will have to generate all sequential patterns first, which require lots of computation unrelated to the final results. For this reason, we propose an algorithm, PMSPM, to find all frequent maximal sequential patterns by cutting most of the intermediate steps. Like pincers, we alternate bottom-up and top-down methods to find most of the maximal sequential patterns in the early top-down stage. Thus, the mining in the bottom-up direction could skip repetitive procedures before the last pass. Therefore, the algorithm could reduce needless support count tests to save time. We demonstrate the correctness and integrity of PMSPM. We test its performance through experimenting with synthetic databases. Finally, we apply our algorithm to mining anamnesis records of diabetes patients, and find out frequent glucose evolution patterns.

並列關鍵字

sequential pattern ； data mining ； maximal sequential pattern

參考文獻

[1] 許俊傑，MIHSPM:一個多項目集的混合循序樣式探勘演算法，淡江大學資訊管理研究所碩士論文，2007年。

[5] R. Agrawal and R. Srikant, “Fast algorithm for mining association rules”, In proceeding of the 20th international conference on VLDB, Santiago, 1994, 487-499.

[7] R. Agrawal and R. Srikant, “Mining Sequential Patterns: Generalizations and Performance Improvements”, Proc. of the Fifth International Conference on Extending Database Technology, 1995.

[8] R. J. Bayardo, “Efficiently Mining Long Patterns from Databases”, Proceedings of the 1998 ACM SIGMOD international conference, 1998, pp.85-93

[9] D. Burdick, M. Calimlim and J. Gehrke, “MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases”, Proceedings of the 17th International Conference on Data Engineering, 2001, pp.443-452.

被引用紀錄

林師晟（2010）。具有時間限制條件的最長頻繁循序樣式探勘演算法〔碩士論文，淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2010.00835

國際替代計量

PMSPM:夾擊最長循序樣式探勘演算法

全文下載

主題瀏覽