考慮密度限制之數值區間關聯規則探勘

本論文提出一個新的數值區間關聯規則探勘方法，稱為PQAR(Partition-based Quantitative Association Rule mining)演算法，以空間分割方式先探勘出滿足相對密度限制的常見數值區間項集合，再由其產生數值區間關聯規則。PQAR方法在探勘常見數值區間項集合時除了考慮最小支持度門檻值的限制外，亦訂定相對密度的限制，避免在相同支持度門檻值要求下，找出資料分佈不集中的區間。此外，PQAR方法採用空間分割方式探勘出符合要求的最大數值區間，不但減少需要掃描資料庫的次數，使得執行時間大為縮短，亦使探勘結果中的區間個數較少，達到找出精簡而重要的數值區間關聯規則之目的。由實驗結果顯示PQAR方法在探勘具不同支持度及相對密度的常見區間項集合，都有很高的正確率。而且在相同的正確率的條件下，本論文方法也較QAR演算法的執行更有效率。

關鍵字

資料探勘；關聯規則

並列摘要

A new approach, called PQAR (Partition-based Quantitative Association Rules mining) algorithm, is proposed in this thesis for mining quantitative association rules. This approach finds out all the frequent interval itemsets that satisfy the minimum relative density requirement based on space partitioning method, and the quantitative association rules are produced from these interval itemsets. When mining frequent interval itemsets, PQAR algorithm considers not only the minimum support as the filtering condition, but also the minimum relative density to prevent finding the intervals in which data distribution is sparse. In addition, based on space partitioning method to find out the largest intervals that meet the threshold requirements, the number of qualified intervals is reduced such that the resulting rules are significant and concise. Furthermore, because the number of times to scan database is reduced possibly in PQAR algorithm, the mining time is shorten considerably than the previous approaches. The experimental results show that, when testing data sets with various supports and relative densities setting, PQAR algorithm obtains results with high accuracy and recall in most cases. Moreover, under the same accuracy condition, PQAR algorithm takes much less time than QAR algorithm.

並列關鍵字

data mining ； association rule

參考文獻

[1] R. Agrawal, T. Imielinski, and A. Swami, “Mining association rules between sets of items in large databases,” In Proc. of ACM-SIGMOD Int. Conf. on Management of Data, 1993.

[3] R. Agrawal and R. Srikant, “Mining quantitative association rules in large relational tables,” In Proc. of ACM-SIGMOD Int. Conf. on Management of Data, 1996.

[4] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, “Automatic subspace clustering of high dimensional data for data mining application,” In Proc. of ACM-SIGMOD Int. Conf. on Management of Data, 1998.

[6] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” In Proc. of ACM-SIGMOD Int. Conf. on Management of Data, 2000.

[8] J. S. Park, M. S. Chen, and P. S. Yu, “An effective hash-based algorithm for mining association rules,” In Proc. of ACM-SIGMOD Int. Conf. on Management of Data, 1995.

國際替代計量

考慮密度限制之數值區間關聯規則探勘

主題瀏覽