本文提出一個二階段分群演算法:階層式K-means分群法(HKC, Hierarchical K-means Clustering)。在分割階段,HKC以K-means將資料集合分割成多個群聚。在此增加群聚的數量是爲了降低雜訊及離群值對K-means的影響。在合併階段則採用單一連結聚合演算法來彌補K-means無法探索任意形狀群聚的缺點,並且還能提供樹狀的分群結果。由於K-means將所有要處理的資料減化成數個群聚,所以HKC可以快速的產生樹狀的分群結果。實驗結果顯示,HKC的準確率相當的良好,並且能更有效率地產生樹狀分群結果。
We propose a new clustering algorithm: hierarchical K-means clustering algorithm (HKC), in this paper. HKC consists of two phases. In the first phase, HKC employs K-means clustering algorithm to split the original data into some groups. The purpose of the first phase is to handle the outliers and noises. In the second phase, HKC employs single-linkage agglomerative algorithm, which can discover the arbitrarily shaped clusters and produce a clustering tree, to merge the groups. Since the processed data are simplified to some groups by K-means, the clustering tree could be obtained quickly. In this paper, the accuracy of HKC is evaluated and compared with those of K-means and hierarchical clustering. The experimental results indicated that the accuracy of HKC is better than K-means and hierarchical clustering. Hence HKC could assist the researchers to quickly and accurately analyze data.