An Efficient k-Means Clustering Algorithm Using Simple Partitioning

The k-means algorithm is one of the most widely used methods to partition a dataset into groups of patterns. However, most k-means methods require expensive distance calculations of centroids to achieve convergence. In this paper, we present an efficient algorithm to implement a k-means clustering that produces clusters comparable to slower methods. In our algorithm, we partition the original dataset into blocks; each block unit, called a unit block (UB), contains at least one pattern. We can locate the centroid of a unit block (CUB) by using a simple calculation. All the computed CUBs form a reduced dataset that represents the original dataset. The reduced dataset is then used to compute the final centroid of the original dataset. We only need to examine each UB on the boundary of candidate clusters to find the closest final centroid for every pattern in the UB. In this way, we can dramatically reduce the time for calculating final converged centroids. In our experiments, this algorithm produces comparable clustering results as other k-means algorithms, but with much better performance.

關鍵字

clustering ； k-means algorithm ； centroid ； k-d tree ； data mining

國際替代計量

全文下載

主題瀏覽

An Efficient k-Means Clustering Algorithm Using Simple Partitioning

摘要

關鍵字

延伸閱讀

國際替代計量

本網站使用Cookies