以塔布基因為基礎的兩階段螞蟻分群方法

分群就是將一組資料依照相似程度將其分成數個群集（cluster），相同群集內的資料會有較高的相似性，而不同群集內的資料則會有較大的異質性。傳統上常用的方法有k平均值法（k-means）、凝聚法（agglomerative）、分裂法（divisive）等統計應用的方法，近年來則還應用了基因演算法（genetic algorithms, GA）、螞蟻族群系統（ant colony system, ACS）等啟發式的分群方法，其中以k平均值法最被廣泛應用。不過分群最困難的地方就是在於分群數上的選擇，過去常會訂立一個分群效度指標（cluster validity index），再一個一個去計算出擁有最佳分群效度的分群數，但這卻是個非常耗費時間的方式。為了解決使用者必須預先決定分群數的困擾，本研究提出一個兩階段的分群方法，簡稱TAC，第一階段為可自動決定群數的初步螞蟻分群法，此階段主要目的在於建構一個高密集度聚合的分群結構；第二階段則為針對第一階段分群的結果，搭配基因遮罩及分群效度指標的螞蟻分群法，此階段主要目的在於調整第一階段分群的結果，找出最佳的分群數及分群結構。實驗方面，利用UCI Machine Learning Repository及Gerrild and Lantz所提供的資料集，並與六個分群方法進行比較。實驗過程中，wine資料集的分群數與實際群數不符，經過試驗發現PBM指標某項因子的影響力過大，因此我們將此指標稍加修改，結果顯示本研究修改後的指標不但能找出正確的分群數，所提出的方法在分群的建構上也有比較好的結果。

關鍵字

資料分群；螞蟻演算法；基因演算法；塔布搜尋法；分群效度指標

並列摘要

Clustering is a process that organizes a set of datum into a number of clusters according to their similarity, and datum within a cluster are more similar to each other than they are belong to different clusters. There are many common algorithm, like K-means, Agglomerative, and Divisive, which are already perform various practical cases. Recently, numerous heuristic algorithms such as Genetic Algorithm (GA) and Ant Colony System (ACS) have been proposed to the clustering problems. K-means is the most popular technique among above. However, the most difficult problem in clustering is deciding a number of clusters. In the past, the general solution to the problem is that one performs a clustering technique with a cluster validity index which is selected in advance for all possible numbers of clusters in turn to find out a best number of clusters. In order to solve the troublesome problem which user have to determine a number of clusters beforehand, we propose a two-phased clustering algorithm, named TAC. The first phrase is an ant clustering algorithm which is able to automatically decide a number of clusters, and the purpose of this phrase is to construct a high similarity clustering structure. The second phrase is an ant clustering algorithm which is mixed genetic mask and cluster validity index, and the purpose of this phrase is to continuously adjust the clustering structure of the first phrase according to the result of the first one, so that we can find out the best number of clusters and the clustering structure finally. In our experiment, we use four data sets to perform the clustering problem and compare with six clustering techniques. In the process of the experiment, the number of clusters in wine data set do not conform the actual number of clusters in it. In turns of this, we modify one factor of PBM index slightly, and the result reveals that not only can we obtain the correct number of clusters but also can construct a better clustering structure with the modified index in this research.

並列關鍵字

Clustering ； Ant Algorithms ； Genetic Algorithm ； Tabu Search ； Cluster Validity Index

參考文獻

[4] MacQueen, J.B. “Some Methods for Classification and Analysis of Multivariate Observations”, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1: 281-297, 1967.

[5] Leonard. Kaufman and Peter J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis, New York: John Wiley & Sons, 1990.

[8] Dorigo M., Gambardella L.M. “Ant colony system: a cooperative learning approach to the traveling salesman problem,” IEEE Transactions on Evolutionary Computation, Vol. 1, pp. 53-66, 1997.

[9] Holland, J.H., “Adaptation in Natural and Artificial Systems”, Ann Arbor, MI：University of Michigan Press, 1975.

[11] Dunn, J.C., “Well separated clusters and optimal fuzzy partitions”, J. Cybern., Vol. 4, pp. 95-104, 1974.

被引用紀錄

朱芳儀（2009）。以螞蟻、塔布基因為基礎的混合式雞尾酒分群法之探討〔碩士論文，淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2009.00804

國際替代計量

以塔布基因為基礎的兩階段螞蟻分群方法

全文下載

主題瀏覽