In the era of the explosion of data volume, data processing is often performed in the traditional way for such orders of magnitude. The emergence of Hadoop big data platform has led people to think of combining big data platform with clustering algorithm to improve data processing efficiency; This paper studies the k-means algorithm based on the big data platform, studies the acceleration ratio and iteration frequency of k-means, first builds the Hadoop big data platform, and verifies the acceleration ratio of k-means in cluster and pseudo-distributed environment through multiple experiments. Through the comprehensive analysis of the recording time and the number of iterations, it is finally verified that the combination of k-means algorithm and canopy algorithm can improve the clustering accuracy and efficiency more effectively.