平行化關聯式規則資料探勘演算法之效能比較

網際網路的發達以及資料庫技術的成熟，取得資料變得非常容易，商品的選擇更加多樣化。對於消費者而言，會希望在購買商品時能減少尋找的時間；對供應商而言，會希望能將商品依關聯性將其放置一起，有效的銷售商品。隨著電腦硬體技術的高速發展，雲端科技在業界儼然成為熱門議題。然面對與日俱增的大量資料，如何有效地縮短運算時間及分析巨量資料，成為一個重要的研究方向。本研究的目的是透過MapReduce 架構平行化設計並分析兩種不同類型探勘演算法，包括關聯式規則模式之Apriori與PIETM 演算法，有效處理大型資料庫並縮短執行時間。

關鍵字

關聯式規則；資料探勘；雲端運算； MapReduce ； HDFS

並列摘要

Getting information becomes easier and selecting goods becomes more diverse due to the growth of internet and the mature database technology. Consumers want to decrease the searching time when buying goods. For the suppliers, they want to place the goods together depending on relevance for effectively selling goods. Cloud technology has become a hot topic in the industry circle as rapid development of computer hardware technology. How to effectively decrease computing time and analyze big data when handling more and more information? However, it becomes an important topic. In this study, through parallel MapReduce framework design, we perform association rules of data mining by analyzing the two famous mining algorithms (Apriori and PIETM). We find that the performances of Apriori and PIETM algorithm have their own pros and cons in parallel MapReduce framework design from our experiment.

並列關鍵字

Association rules ； Data Mining ； Cloud Computing ； MapReduce ； HDFS

參考文獻

1. Agrawal, R., Imielinski, T. and A. Swami, “Mining Association Rules Between Sets of Items in Large Databases,” Proc. Of the ACM SIGMOD Conf. on Management of Data, 1993, pp. 207-216.

3. Brin, S., Motwani, R., Ullman, J. D. and Tsur, S. “Dynamic Itemset Counting and Implication Rules for Market Basket Data,” ACM-SIGMOD Conference Management of Data, 2005, pp. 255-264.

4. Chen, S. Y., Li, J. H., Lin, K. C., Chen, H. M., and Chen, T. S. “Using MapReduce Framework for Mining Association Rules,” in Electrical Engineering Volume 253, 2013, pp 723-731。

6. Ghemawat, S., Gobioff, H. and Leung, S. T. “The Google File System,” in Proceedings of the 19th ACM Symposium on Operating Systems Principles, 2003, pp. 29-43.

8. Jea, K. F., Chang, M. Y., and Lin, K. C. “An Efficient and Flexible Algorithm for Online Mining of Large Itemset,” in Information Processing Letters, Volume 92, Issue 6, 2004, pp. 311-316.

國際替代計量

平行化關聯式規則資料探勘演算法之效能比較

全文下載

主題瀏覽