結合K-means及粒子演算法進行動態資料分群

傳統K-means方法需要預先設定組數才能進行資料分群，為解決此問題，本研究以粒子最佳化演算法(Particle Swarm Optimization)為基礎，發展出一套動態資料分群方法，用以處理無法預先知道組數之分群問題。本研究提出之方法，結合了K-means及組合式粒子最佳化演算法，我們稱之為K-means with Combinatorial Particle Swarm Optimization (KCPSO)，在演算法開始前，先給予一個最大分群組數之參數，使用組合式粒子最佳化演算法在此最大分群組數下，透過分群效度指標(Cluster Validity Index)的衡量，調整各粒子的分群組數，接著利用粒子最佳化演算法之記憶與分享資訊的能力來選取群中心，並利用K-means調整群中心的位置，如此能改善初始群中心對K-means的影響，並找出適當的分群組數與分群結果。本演算法已被開發成系統，並透過數種資料分群題目進行驗證，實驗結果顯示，相較於其他類似演算法，KCPSO能更快速且有效的分群。

關鍵字

動態分群；粒子最佳化演算法； K-means ；資料分群

並列摘要

This paper presents a dynamic data clustering algorithm named K-means with Combinatorial Particle Swarm Optimization (KCPSO). Unlike the K-means method, KCPSO does not need a specific number of clusters before clustering is performed and is able to find the proper number of clusters automatically. A predefined parameter of maximum cluster number is given, and a cluster validity index is employed to evaluate the clustering results in order to adjust the cluster number of each particle. Then, the cluster center among particles is adjusted by using K-means. KCPSO is able not only to avoid the drawback of K-means but also to determine the proper number of cluster. KCPSO has been developed into a system and evaluated by testing some datasets. Results show that KCPSO is an effective algorithm in providing the optimal number of clusters.

並列關鍵字

K-means ； Particle Swarm Optimization ； Dynamic clustering ； Data clustering

參考文獻

[1] Welch, W.J., “Algorithmic complexity: Three NP-hard problems in computational statistics.” J.Statist. Comput. Simulation, vol. 15, pp.17–25, 1982.

[3] Halkidi, M. Batistakis, Y. and Vazirgiannis, M. “Clustering validitychecking methods: Part II.”, ACM SIGMOD Record, 31(3):19–27,2002.

[4] MacQueen, J. B. "Some Methods for classification and Analysis of Multivariate Observations.", Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, 1:281-297, 1967.

[5] Peña, J. M., Lozano, J. A., and Larrañaga, P.,” An empirical comparison of four initialization methods for the K-Means algorithm.” Pattern Recognition Letters 20 1027-1040,1999.

[6] Goldberg, D.E., “Genetic Algorithm in Search.” Optimization and Machine Learning.”, Addision-Wesley, New York, 1989.

被引用紀錄

潘宜蓁（2009）。結合K-means及差分演化法之入侵偵測研究〔碩士論文，大同大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0081-3001201315104223

謝凱明（2009）。結合FCM 及PSO 處理動態模糊分群問題〔碩士論文，大同大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0081-3001201315104406

國際替代計量

結合K-means及粒子演算法進行動態資料分群

未授權

主題瀏覽