本研究針對常用的五個演算法於資料分群問題上的求解品質和績效進行比較研究。 分群方法是根據資料屬性,將資料集適當的分群為數個子集合,其目標為同一群內的資料儘可能的相同,不同群組間的資料則儘可能的不同。 如果資料複雜度提高或資料點變多,啟發式演算法的求解效率通常會變差,求解品質也會下降。本研究依資料分群問題的要求,並以資料點與各群中心的距離為評估績效值,進行啟發式演算法的求解速度與品質的比較。本研究選出常用的五個演算法:禁忌搜尋法、退火模擬法、基因演算法、螞蟻演算法及粒子演算法,開發成系統,針對真實世界的題庫資料集以及人工資料集進行實驗,進行分群績效優劣比較。實驗結果發現,除了禁忌搜尋法不適用於大型問題,退火模擬法、基因演算法的求解速度較快,螞蟻演算法及粒子演算法則能得到較穩定的求解品質。
In this study, five popular meta-heuristic approaches - Tabu Search(Tabu), Simulated Annealing(SA), Genetic Algorithms(GA), Ant Colony Optimization(ACO) and Particle Swarm Optimization(PSO) - on data clustering are analyzed based on the quality of objective function value and the performance of CPU runtime. Clustering is a procedure of grouping objects by the attributes of data, with larger inter-intra cluster distances and smaller intra-cluster distance expected. However, while the complexity of data arises or the size of data increases, the quality and the performance might be worse. By evaluating the distance between the data and centroid, the quality and performance of each approach are measured and compared. These five approaches have been implemented into a system and conducted experiments on real and artificial datasets. Results illustrate that Tabu is limited to large datasets, SA or GA is superior with respect to the performances, and ACO or PSO is more concerned with a better quality of results.