Research on Canopy-Kmeans Algorithm based on Hadoop

This paper analyzes the advantages and disadvantages of traditional K-means and Canopy algorithms, and proposes an improved K-means algorithm based on Canopy. At the same time, it uses the ＂min-max principle＂ to improve its space complexity and randomness problems, and applies it to the MapReduce programming model under the Hadoop platform. Experiments show that this method is more accurate and accurate than the traditional K-means and Canopy algorithms. stability.

關鍵字

Hadoop ； MapReduce ； Canopy ； K-means Algorithm ； Clustering

參考文獻

Yongjun, ang, Jing Sun. The Research of Meteorological Data Mining Using Discrete Bayesian Networks Classifier Based on Hadoop[P]. Proceedings of the 2015 International Conference on Electrical, Computer Engineering and Electronics, 2015.

Khabat Khosravi, Prasad Daggupati, Mohammad Taghi Alami, Salih Muhammad Awadh, Mazen, Ismaeel Ghareb, Mehdi Panahi, BinhThai Pham, Fatemeh Rezaie, Chongchong Qi, Zaher Mundher Yaseen. Meteorological data mining and hybrid data-intelligence models for reference evaporation simulation: A case study in Iraq[J]. Computers and Electronics in Agriculture,2019,167.

Dan Meng, Jizhong Han, Jianfeng Zhan, Bibo Tu, Xiaofeng Shi and Le Wan, "Transformer: A New Paradigm for Building DataParallel Programming Models", Vol. 30, Issue 4, pp. 55-64, 2010.

Dawei Jiang, Antony K.H. Tung and Gang Chen, “MAP-JOINREDUCE: Toward Scalable and Efficient Data Analysis on Large Clusters”, IEEE Trans. Knowledge and Data Engineering, Vol. 23, No. 9, pp. 1299-1311, September 2011.

Wei Qu. Efficient File Accessing Techniques on Hadoop Distributed File Systems[A]. ICYCSEE Steering Committee. Abstract of the Second International Conference of Young Computer Scientists, Engineers and Educators, ICYCSEE 2016 PartI[C].

國際替代計量

Research on Canopy-Kmeans Algorithm based on Hadoop

全文下載

主題瀏覽