A Decomposable Algorithm For Mining Frequent Itemsets In A Distributed Parallel Environment

Knowledge discovery in databases (KDD), also called data mining, is an attractive issue in the realm of academic and business research. Frequent itemsets mining performs an essential role since it is a primary stage of association analysis. At the present time, many methods widely adopt a distributed-parallel approach to enhance time efficiency; however, it is still inadequate. The prime reason is that in previous studies the task of discovering frequent itemsets cannot be performed completely as a seamless and concurrent way. In this thesis, based on the distributed-parallel strategy and item-transformation technology, we devise a decomposable algorithm, named D-Mining, for mining frequent itemsets. Proved by mathematical induction, D-Mining is correct in terms of the occurrence of itemsets and the number of itemsets. Furthermore, the experimental results demonstrate that D-Mining possesses a high return on the investment of computation resource by comparison with the previous study pp-tree that is also designed for distributed parallel environment. Given the same parameters, that is, every distributed local site has multiple CPUs; and then D-Mining is much more efficient than pp-tree. In particular, D-Mining is stable and appears to the similar efficiency even though D-Mining operates in tough propositions such as: average length of transactions is long, the number of items is large, and the value of support threshold is small.

並列關鍵字

Frequent Itemsets ； Distributed Parallel Technology ； Data Mining ； Association Analysis ； Item-transformation

參考文獻

[1] Tan, P.N., Steinbach, M., and Kumar, V., Introduction to Data Mining, Addison Wesley, New York, 2006.

[2] Han, J., Cheng, H., Xin, D., and Yan, X., “Frequent pattern mining: current status and future directions,” Data Mining and Knowledge Discovery, vol. 15, no. 1, pp. 55-86, 2007.

[5] Gouda, K. and Zaki, M.J., “Efficiently mining maximal frequent itemsets,” in Proceedings of the IEEE International Conference on Data Mining (ICDM) , pp. 163-170, 2001.

[7] Liu, G., Lu, H., and Xu Yu, J., “CFP-tree: A compact disk-based structure for storing and querying frequent itemsets,” Information Systems, vol. 32, no. 2, pp. 295-319, 2007.

[9] Tanbeer, S.K., Ahmed, C.F., Jeong, B.-S., and Lee, Y.-K., “Efficient single-pass frequent pattern mining using a prefix-tree,” Information Sciences, vol. 179, no. 5, pp. 559-583, 2009.

被引用紀錄

竇文思（2014）。運動應用程式對糖尿病患者身體活動量與血液生化值之影響〔碩士論文，國立屏東科技大學〕。華藝線上圖書館。https://doi.org/10.6346/NPUST.2014.00227

國際替代計量

A Decomposable Algorithm For Mining Frequent Itemsets In A Distributed Parallel Environment

未授權

主題瀏覽