漸進式資料探勘之關聯規則維護及其量化關聯規則之應用

關聯規則演算法是資料探勘(Data Mining)中一項相當重要的技術，挖掘頻繁項目(frequent patterns)的主要目的是從交易資料庫 (transaction database) 中，尋找出交易次數頻繁的交易項目之間的關聯性，進而找出其間相關的關聯規則(association rules)。在靜態資料庫(static database)的資料挖掘方面，已有許多學者提出一些演算法，在這類的研究主要分成兩大類：其一是利用Apriori-like 方式(Apriori-like Candidate Sets Generation-and-Test Approach)，其最主要的花費在於產生候選頻繁項目集(generation candidate frequent itemsets)及重新掃描資料庫的I/O花費；另一種是使用None Apriori-like 的方式去找出大項目集(large itemsets)。這兩類的方法中，Apriori-like方式在找出所有k frequent itemsets時需要k次的重覆掃描資料庫；在效率的考量方面None Apriori-like明顯優於Apriori-like，因為它不需要一直重覆掃描資料庫(rescan database)。在整個資料關聯規則挖掘中，如果能減少重覆掃描資料庫的次數將可大大減少I/O所花費的時間，而在這一類中又以FP-tree的結構最為有效率且已被應用於DBMiner中。本論文主要應用FP-tree的結構，再加上準大項目集(pre-large itemset)的概念及QFP-growth演算法的方式進而實現漸進式資料探勘(incremental data mining)。QFP-growth它不但繼承FP-growth的所有優點且也可以避免產生大量conditional FP-tree的瓶頸，其主要是增加一個暫存的根(temporary root)去減少執行時間及經由FP-growth演算法所需產生的記憶體儲存空間。結合上述的方式我們提出一個新的IQFP-growth(Incremental QFP-growth)演算法,它不僅可以改善FP-growth演算法的缺點,而且可以達成漸進式資料探勘的方法(incremental data mining approach)，並且將其觀念推廣到量化關聯規則(quantitative association rules) 及模糊關聯規則(Fuzzy data mining)中。

關鍵字

資料探勘；關聯規則； FP-tree ；準大項目集(pre-large itemset) ； QFP-growth ；漸進式資料探勘(Incremental data mining) ；量化關聯規則(quantitative association rules) ；模糊關聯規則(Fuzzy data mining)

並列摘要

Association rules algorithm is an important technology in Data Mining. The major purpose discovering the frequent pattern is to look for the association between all the transaction items traded frequently. Accordingly, we can find out the association rules. Many algorithms on data mining in the static database have been proposed. They are mainly divided into two classes. The first one is to utilize Apriori-like Candidate Sets Generation-and-Test approach. Its main expenditure lies in generation candidate frequent itemsets and I/O cost when rescaning the transaction database. The other one is to use None Apriori-like approach to find out large itemsets. Apriori-like approach needs rescan the database k times while finding out all k-frequent itemsets. None Apriori-like approach is obviously superior to Apriori-like apprpach in the manner of efficiency, because it doesn’t need to rescan the database repetitively. In the associational rules of mining, we can reduce the time spent of I/O greatly if we reduce the number of rescan in the database. On the other hand, the structure of FP-tree applied in DBMiner is the most efficient. This thesis combines the structure of FP-tree, the concept of pre-large itemsets and QFP-growth algorithm to realize the incremental data mining. The QFP-growth algorithm not only inherits all merits of FP-growth but also avoid producing the bottleneck of massive conditional FP-tree. In QFP-growth algorithm, it mainly increases the temporary root to reduce the execution time and the memory storage space produced by FP-growth algorithm. In this thesis, we propose a novel Incremental QFP-growth algorithm. The algorithm not only improves the disadvantage of FP-growth algorithm but also achieve the performance of incremental data mining approach. We also extend the idea to the quantitative association rules and the Fuzzy data mining.

並列關鍵字

Data mining ； Associational rules ； FP-tree ； Ppre-large itemset ； QFP-growth ； Incremental data mining ； Quantitative association rules ； Fuzzy data mining

參考文獻

[18] Pauray S. M. Tsai and Chien-Ming Chen, Mining quantitative association rules in a large database of sales transactions, Journal of Inforamtion Science and Engineering, Vol.17 No.4, pp.667-681 (July 2001)

[1] R. Agrawal, T. Imielinksi, A. Swami, Mining association rules between sets of items in large database,in: The ACM SIGMOD Conference, 1993, pp. 207–216.

[2] Savasere, E. Omiecinski, and S. Navathe, An Efficient Algorithm for Mining Association Rules in Large Databases, Proc. 21th VLDB, pp. 432-444, 1995.

[3] Jiawei Han, Jian Pei, Yiwen Yin, Runying Mao, Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach, Data Mining and Knowledge Discovery, 8, 53-87, 2004

[5] J.S. Park, M.S. Chen, and P.S. Yu, "An Effective Hash Based Algorithm for Mining Association Rules", Proc. ACM SIGMOD, pp. 175-186, 1997

國際替代計量

漸進式資料探勘之關聯規則維護及其量化關聯規則之應用

未授權

主題瀏覽