以螞蟻、塔布基因為基礎的混合式雞尾酒分群法之探討

分群是將相似度高的物件分類成群，而分群有許多方法，從傳統的階層式分群法、分割式分群法、密度分群法到應用啟發式演算法在分群上。對於傳統的分群法，例如常見的K-means，使用者需先決定群數，才能進行分群。本研究發展一個能自動決定適合群數的演算法，簡稱AGKT，混合了螞蟻分群、基因、K-means及塔布搜尋演算法之概念，並探討以不同分群效度指標做作目標函數之正確率差異。演算法分為兩階段：第一階段由螞蟻分群法(ASCA)產生初始群組；第二階段搭配基因遮罩及分群效度指標的K-means分群，找出最佳群數與分群結構。在效能評估方面，我們使用UCI Machine Learning Repository[3]和Gerrild and Lantz[11]所提供的4個資料集，與其他七個分群方法進行比較與評估[2,15]。此外亦利用該資料集，來探討目前提出之分群效度指標，並提出一種新的分群效度指標PBM+ index。實驗結果顯示，相較於其他七個分群方法，AGKT皆能非常快速且正確分群。相較於ESTA[15]分群法， AGKT平均約快40倍且分群效度表現上差不多。探討分群效度指標方面，我們研究效度指標，分別為：Dunn’s index、Davies Boundin index、PBM index及我們提出的PBM+ index。而實驗證實，4種分群效度指標中，以PBM+ index進行分群得到了較好的分群數與分群結果。

關鍵字

螞蟻分群法；基因演算法；塔布搜尋法； k均值法；分群效度指標

並列摘要

Clustering is to collect those objects are high similarity into a group, there are many methods to clustering. The common methods of clustering include the traditional hierarchical clustering, partitional clustering, density-based clustering and and application of heuristic clustering algorithms on the cluster. K-means is a kind of traditional clustering methods, users need to decide the cluster number before clustering. This research developed an algorithm that can automatically decides the cluster number, called AGKT. It mixed of ant clustering, Gene, K-means and the concept of tabu search algorithm, and explore Cluster validity indexes as to objective functions to compare the performance of the correct rate. There are two stages of algorithm: the first stage generated the initial group by ASCA; The second stage find out the best structure and number of cluster with the mask of Gene and k-means, with Cluster validity index. In the performance assessment, we compare and estimate with seven other clustering methods [2,15], and use four data sets provided from the UCI Machine Learning Repository[3] and Gerrild and Lantz[11]. In addition, we also make use of these data sets to explore the current cluster validity index, and propose a new index called PBM+ index. The results showed that, AGKT can faster and accurate clustering than the seven other clustering methods. Compared to the ESTA [15], AGKT about 40 times faster in the performance of Cluster validity index. Discussion on Cluster validity index, we use four Cluster validity index, including: Dunn's index, Davies Boundin index, PBM index and PBM+ index provided from us. The research confirmed that the four Cluster validity index to PBM+ index has better result.

並列關鍵字

Ant Colony system ； Genetic Algorithm ； Tabu Search ； K-means ； Cluster validity index

參考文獻

2.曾偉哲，以塔布基因為基礎的兩階段碼以分群方法，淡江大學資訊管理學系碩士班碩士論文，2009。

4.D. L. Davies and D. W. Bouldin,” A cluster separation measure,” IEEE Trans. Pattern Anal. Mach. Intell 1(2), 1979,pp. 224-227.

5.M. Dorigo and L. Gambardella, “Ant colony system: A cooperative learning approach to the traveling salesman problem,” IEEE Transactions on Evolutionary Computation 1(1), 1997, pp. 53-66.

7.J. Dunn, “Well-separated clusters and optimal fuzzy partitions,” Cybern. Syst. 4(1), 1974, pp. 95-104.

10.J. H. Holland, Adaptation in Natural and Artificial Systems,1992.

國際替代計量

以螞蟻、塔布基因為基礎的混合式雞尾酒分群法之探討

全文下載

主題瀏覽