透過您的圖書館登入
IP:18.226.169.94
  • 學位論文

啟發式演算法於資料分群問題之比較

A Comparative Study of Meta-Heuristic Approaches on Data Clustering

指導教授 : 高有成
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


本研究針對常用的五個演算法於資料分群問題上的求解品質和績效進行比較研究。 分群方法是根據資料屬性,將資料集適當的分群為數個子集合,其目標為同一群內的資料儘可能的相同,不同群組間的資料則儘可能的不同。 如果資料複雜度提高或資料點變多,啟發式演算法的求解效率通常會變差,求解品質也會下降。本研究依資料分群問題的要求,並以資料點與各群中心的距離為評估績效值,進行啟發式演算法的求解速度與品質的比較。本研究選出常用的五個演算法:禁忌搜尋法、退火模擬法、基因演算法、螞蟻演算法及粒子演算法,開發成系統,針對真實世界的題庫資料集以及人工資料集進行實驗,進行分群績效優劣比較。實驗結果發現,除了禁忌搜尋法不適用於大型問題,退火模擬法、基因演算法的求解速度較快,螞蟻演算法及粒子演算法則能得到較穩定的求解品質。

並列摘要


In this study, five popular meta-heuristic approaches - Tabu Search(Tabu), Simulated Annealing(SA), Genetic Algorithms(GA), Ant Colony Optimization(ACO) and Particle Swarm Optimization(PSO) - on data clustering are analyzed based on the quality of objective function value and the performance of CPU runtime. Clustering is a procedure of grouping objects by the attributes of data, with larger inter-intra cluster distances and smaller intra-cluster distance expected. However, while the complexity of data arises or the size of data increases, the quality and the performance might be worse. By evaluating the distance between the data and centroid, the quality and performance of each approach are measured and compared. These five approaches have been implemented into a system and conducted experiments on real and artificial datasets. Results illustrate that Tabu is limited to large datasets, SA or GA is superior with respect to the performances, and ACO or PSO is more concerned with a better quality of results.

參考文獻


[1] Welch, W.J., “Algorithmic complexity: Three NP-hard problems in computational statistics.” J.Statist. Comput. Simulation, Vol. 15, pp.17–25, 1982.
[2] Selim, S.Z., Ismail, M.A., “K-means type algorithms: a generalized convergence theorem and characterization of local optimality.” IEEE Trans. Pattern Anal. Mach. Intell, 6, pp.81-87, 1984.
Discovering Clusters in Large Spatial Databases with Noise.” Proc. of KDD96:
pp. 226-231, 1996.
clustering algorithm for cellular manufacturing.” International Journal of

被引用紀錄


陳奕如(2011)。地表水與地下水聯合營運優選模式之發展〔碩士論文,國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU.2011.01543
李賜遠(2008)。結合K-means及粒子演算法進行動態資料分群〔碩士論文,大同大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0081-0607200917245316
謝凱明(2009)。結合FCM 及PSO 處理動態模糊分群問題〔碩士論文,大同大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0081-3001201315104406
吳仲涵(2010)。資料探勘應用於自行車之文化差異性行銷〔碩士論文,亞洲大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0118-1511201215465658
李新秋(2010)。具平衡之週期性車輛派遣問題的探討〔碩士論文,國立虎尾科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0028-1907201017343500

延伸閱讀