透過您的圖書館登入
IP:3.17.174.239
  • 學位論文

粒子動態分群演算法

Particle Swarm Optimization for Dynamic Clustering

指導教授 : 高有成
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


資料分群為探勘技術重要研究課題之一,它是有系統的將相似特徵的資料自然的聚集成群,或自動分群處理。資料分群過程需考慮分群組數與資料分群組態,自然地分群組數會影響最終資料分群組態。大部分資料分群演算法都是事先須給予固定的群數,亦即要在分群前給予演算法一個最佳的分群組數,這是一個很大的挑戰,對於資料須有一定的瞭解。而動態分群是指在未定組數下,透過演算法進行資料分群找到最佳分群組數與資料分群組態。 本研究運用粒子最佳化(PSO)演算法,在每個迴圈內經由兩階段動態群聚演化,結合分群效度指標(Cluster Validity Index)的衡量進行資料動態分群,發展出新演算法稱為粒子動態分群演算法PSODC (Particle Swarm Optimization for Dynamic Clustering)。第一階段進行最佳組數演化,運用機率參數有效移動各子群粒子逐漸往最佳群數子群集中。第二階段運用PSO演算法,進行子群內粒子群聚演化,讓子群內粒子在組數固定條件下,往子群最佳解粒子位置有效移動。本論文最後進行3種不同驗證評估PSODC,包含與其他動態分群演算法比較分群績效、與固定組數PSO比較求解品質,不同分群效度指標比較分群績效。實驗結果顯示,PSODC在未知群數下確實能有效的找到各組資料的最佳群數,同時得到較佳且穩定的分群結果,亦可適用於不同分群效度指標。

並列摘要


Data clustering, one of the major research technologies of data mining, is the process of grouping together similar multi-dimensional data vectors into a number of clusters. The process of data clustering needs to consider the number of clusters and the result of clusters. The natural number of clusters will influence the final clustering result. How to find the optimal number of clusters becomes an important issue. In this research, the author develops a novel dynamic clustering method, called Particle Swarm Optimization for Dynamic Clustering (PSODC), to cluster a dataset without setting the cluster number in advance. PSODC consists of two stages: evolution of optimal cluster numbers and data clustering in each sub-swarm representing a specified cluster number. In the first stage, the particles in a sub-swarm randomly move toward one of other sub-swarms based on the so-far-best cluster number. In the second stage, the particle swarm algorithm is used to cluster the data items. After that, a clustering validity index is applied to evaluate the clustering result of each sub-swarm. The above procedure is repeated until the clustering computation converges. To test the proposed algorithm and compare it with other dynamic clustering algorithms, thirteen test problems, including artificial data sets and UCI data sets, are used. The experimental results show that PSODC has outstanding performance in dynamic data clustering.

參考文獻


[19] Sheikholeslami, G., Chatterjee, S., and Zhang, A., “WaveCluster: A Multiresolution
Clustering of High Dimensional Data for Data Mining Applications.”, Proc. of
[2] Bandyopadhyay, S. and Maulik, U.,”Genetic Clustering for Automatic Evolution of Clusters and Application to Image Classification.”,Journal of the Pattern Recognition, vol. 35, no. 6, pp.1197-1208, 2002.
clustering algorithm for cellular manufacturing.” ,International Journal of
[4] Davies, Bouldin ,”A cluster separation measure.”, IEEE Trans Pattern Anal Mach Intell 1(2),1979.

延伸閱讀


國際替代計量