啟發式演算法於資料分群問題之比較

本研究針對常用的五個演算法於資料分群問題上的求解品質和績效進行比較研究。分群方法是根據資料屬性，將資料集適當的分群為數個子集合，其目標為同一群內的資料儘可能的相同，不同群組間的資料則儘可能的不同。如果資料複雜度提高或資料點變多，啟發式演算法的求解效率通常會變差，求解品質也會下降。本研究依資料分群問題的要求，並以資料點與各群中心的距離為評估績效值，進行啟發式演算法的求解速度與品質的比較。本研究選出常用的五個演算法：禁忌搜尋法、退火模擬法、基因演算法、螞蟻演算法及粒子演算法，開發成系統，針對真實世界的題庫資料集以及人工資料集進行實驗，進行分群績效優劣比較。實驗結果發現，除了禁忌搜尋法不適用於大型問題，退火模擬法、基因演算法的求解速度較快，螞蟻演算法及粒子演算法則能得到較穩定的求解品質。

關鍵字

粒子演算法；資料分群；禁忌搜尋法；退火模擬演算法；基因演算法；螞蟻演算法

並列摘要

In this study, five popular meta-heuristic approaches - Tabu Search(Tabu), Simulated Annealing(SA), Genetic Algorithms(GA), Ant Colony Optimization(ACO) and Particle Swarm Optimization(PSO) - on data clustering are analyzed based on the quality of objective function value and the performance of CPU runtime. Clustering is a procedure of grouping objects by the attributes of data, with larger inter-intra cluster distances and smaller intra-cluster distance expected. However, while the complexity of data arises or the size of data increases, the quality and the performance might be worse. By evaluating the distance between the data and centroid, the quality and performance of each approach are measured and compared. These five approaches have been implemented into a system and conducted experiments on real and artificial datasets. Results illustrate that Tabu is limited to large datasets, SA or GA is superior with respect to the performances, and ACO or PSO is more concerned with a better quality of results.

並列關鍵字

ACO ； PSO ； Data clustering ； Tabu Search ； Simulated Annealing ； Genetic Algorithm

參考文獻

[1] Welch, W.J., “Algorithmic complexity: Three NP-hard problems in computational statistics.” J.Statist. Comput. Simulation, Vol. 15, pp.17–25, 1982.

[2] Selim, S.Z., Ismail, M.A., “K-means type algorithms: a generalized convergence theorem and characterization of local optimality.” IEEE Trans. Pattern Anal. Mach. Intell, 6, pp.81-87, 1984.

Discovering Clusters in Large Spatial Databases with Noise.” Proc. of KDD96:

pp. 226-231, 1996.

clustering algorithm for cellular manufacturing.” International Journal of

被引用紀錄

陳奕如（2011）。地表水與地下水聯合營運優選模式之發展〔碩士論文，國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU.2011.01543

李賜遠（2008）。結合K-means及粒子演算法進行動態資料分群〔碩士論文，大同大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0081-0607200917245316

謝凱明（2009）。結合FCM 及PSO 處理動態模糊分群問題〔碩士論文，大同大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0081-3001201315104406

吳仲涵（2010）。資料探勘應用於自行車之文化差異性行銷〔碩士論文，亞洲大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0118-1511201215465658

李新秋（2010）。具平衡之週期性車輛派遣問題的探討〔碩士論文，國立虎尾科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0028-1907201017343500

國際替代計量

啟發式演算法於資料分群問題之比較

未授權

主題瀏覽