Applying Simplified Swarm Optimization for Solving Clustering Problem

資料分群廣泛應用於各種領域，是一種藉由某特定的衡量指標將資料分類成群的方法。K-means (KM) 與 K-harmonic-means (KHM) 因簡單有效率是常見且基礎的分群工具。然而，KM與KHM皆有其本質上的缺陷，導致效能或效率不佳。本文，應用簡群演算法（simplified swarm optimization）分別針對KM與KHM的缺陷提出兩種分群演算法，藉此改善KM與KHM在分群上的效能與效率。其次，由於資訊科技發展、網路崛起，催生大數據時代的來臨，許多現有的分群演算法無法在合理時間內處理大型資料集，其中包含KM與KHM。為改善提出之分群演算法在處理大型資料集之效率，本研究結合主成分分析與新式隨機取樣技術，發展一個新的分群架構。此一架構之原理是藉由同時減少資料集之維度與資料點之取用率，大幅提昇分群演算法的效率。為驗證本研究提出之演算法與架構，所有實驗均使用真實資料集，實驗結果均與過去文獻比較。

關鍵字

資料分群；簡群演算法；主成分分析；隨機取樣； K平均分群； K調和平均分群

並列摘要

Data clustering is commonly employed in many disciplines. The aim of clustering is to partition a set of data into clusters, in which objects within the same cluster are similar and dissimilar to other objects that belong to different clusters. K-means (KM) and K-harmonic-means (KHM) are two common and fundamental clustering methods because of their simplicity and efficiency. However, both of them suffer from some problems. This study presents two novel algorithms based on simplified swarm optimization to deal with the drawbacks of KM and KHM, respectively. In addition, with the advance of internet and information technologies, the data size is increasing explosively and many existing clustering approaches including KM and KHM are inefficiency for dealing with the large-size problem. For that, we propose a clustering framework by exploring the connection between principle component analysis and a novel random sampling technique into a procedure to increase the scalability of the proposed clustering algorithm. To empirically evaluate the performance of the proposed methods, all experiments are examined using real-world datasets, and corresponding results are compared with recent works in the literature.

並列關鍵字

data clustering ； K-means ； K-harmonic-means ； simplified swarm optimization ； principle component analysis ； random sampling technique

參考文獻

[1] A. A. Chaves and L. A. N. Lorena, "Clustering search algorithm for the capacitated centered clustering problem," Computers & Operations Research, vol. 37, pp. 552-558, 2010.

[2] R. Xu and D. Wunsch, "Survey of clustering algorithms," IEEE Transactions on Neural Networks, vol. 16, pp. 645-678, 2005.

[3] S. Z. Selim and M. A. Ismail, "K-means-type algorithms: a generalized convergence theorem and characterization of local optimality," IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 81-87, 1984.

[4] K. Krishna and M. N. Murty, "Genetic K-means algorithm," IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 29, pp. 433-439, 1999.

[5] U. Maulik and S. Bandyopadhyay, "Genetic algorithm-based clustering technique," Pattern recognition, vol. 33, pp. 1455-1465, 2000.

國際替代計量

Applying Simplified Swarm Optimization for Solving Clustering Problem

全文下載

主題瀏覽