CHSPM:一個完整的混合循序樣式探勘演算法

現有循序樣式探勘的研究依照樣式中相連的項目是否必須在交易紀錄中緊密相連可粗略的分為以下三類，第一類為找出連續循序樣式；第二類為找出非連續循序樣式；第三類為找出混合循序樣式。過去混合循序樣式探勘的演算法都以Apriori為基礎，但這些方法探勘出的結果並不完整，所以我們針對混合循序樣式探勘，以樣式成長(pattern-growth)方法為基礎，提出一個新的演算法CHSPM(A Complete Hybrid Sequential Patterns Mining Algorithm)，以窮舉法來找出完整之混合循序樣式。 CHSPM演算法有以下四個步驟，分別為：1.產生增補一階頻繁樣式；2. 縮減資料庫；3. 分割資料庫，建立投影資料庫；4. 探勘投影資料庫，建立子投影資料庫，直到找出所有的混合循序樣式。為了驗證CHSPM的探勘結果，我們使用10萬至30萬筆的模擬資料來進行實驗，並與過去探勘混合循序樣式效率最佳的GFP2 演算法比較。實驗結果顯示，雖然CHSPM在效能上不如GFP2，但可以探勘出完整的混合循序樣式。

關鍵字

循序樣式；資料探勘；樣式成長

並列摘要

Based on whether consecutive items in sequential patterns should also be consecutive in the transactions, existing researches about sequential pattern mining could be classified into the following three categories: The first is to find continuous patterns; the second is to find discontinuous patterns; the third is to find hybrid patterns that combine both continuous patterns and discontinuous patterns. Previous hybrid sequential pattern mining algorithms were all based on the Apriori algorithm, but we discovered that their mining results are incomplete. Thus, based on the pattern-growth method, we propose a new algorithm (CHSPM) to find complete hybrid sequential patterns. The four steps of CHSPM are as follows: 1. Build the supplemented frequent-1-sequence item set; 2. Reduce the database by erasing unimportant items from the transactions. 3. Partition the database, and build projected databases. 4. Recursively mine the projected databases and build sub-projected databases until all hybrid sequential patterns are found. Finally, we use synthetic databases of 100,000 to 300,000 records to test our algorithm, and to compare our results with those of GFP2, the most efficient algorithm in hybrid sequential pattern mining up to now. The result shows that even though CHSPM is slower than GFP2, it can find out complete hybrid sequential patterns.

並列關鍵字

sequential pattern ； data mining ； pattern-growth

參考文獻

[1] R. Agrawal, T. Imielinski and A. Swami, “Mining association rules between sets of items in large databases,” Proceedings of the 1993 ACM SIGMOD international conference. Management of Data, 1993, pp. 207-216.

[2] R. Agrawal and R. Srikant, “Mining sequential patterns: Generalizations and performance improvements,” Proceedings of the 5th International Conference on Extending Database Technology, 1996, pp. 3-17.

[6] Y. L. Chen, S. S. Chen, and P. Y. Hsu, “Mining hybrid sequential patterns and sequential rules,” Information Systems, 2002, Vol. 27, Issue 5, pp.345-362.

[16] M. S. Waterman, Introduction to Computational Biology: Maps, sequences, and genomes, Boca Raton: CRC Press, 1995.

[19] M. J. Zaki, “SPADE: An efficient algorithm for mining frequent sequences,” Machine Learning, 2001, Vol. 42, Issue 1-2, pp.31-60.

被引用紀錄

許俊傑（2007）。MIHSPM:一個多項目集的混合循序樣式探勘演算法〔碩士論文，淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2007.00920

國際替代計量

CHSPM:一個完整的混合循序樣式探勘演算法

全文下載

主題瀏覽