發展一個以螞蟻理論為基之群集演算法

群集分析（Cluster Analysis）為資料探勘（Data Mining）領域中，最常被用以對大量資料進行預測、推估的應用技術，其主要的目標在於透過群集演算法區隔出不同且未知類別的資料，並將相似性較高的資料形成不同的群集，使決策者得以藉由群集分析應用的結果提供決策分析時所需的參考資訊，故分群演算法的發展及使用，實為相當重要且值得研究的一環。在群集問題分析之分群求解的應用上，基於群集演算法在使用上之便利性、時效性及易取得等方面的考量，K-Means演算法是最常被使用於分群的一種工具，然而，在實際的群集問題應用中，K-Means演算法卻也有其缺點。本論文結合傳統群集演算法的觀念與啟發式方法之螞蟻理論的技術，發展一個能夠求得較佳全域解之群集結果的群集演算法，以跳脫K-Means演算法易於落入局部最佳化群集結果的窘境。為了驗證本研究所提出之方法為一有效之群集演算法，本研究透過數個範例資料的實驗驗證，探討在實際應用上所產生的群集結果之優劣程度，由此些應用中可知我們所提出的群集演算法確實能夠改善K-Means演算法的缺點，進而求得較佳之群集結果目標值與群集正確率。本研究並以一實際的PCB製造廠商之產品設計規格資料及品質缺陷資料，進行PCB新產品品質缺陷之預估，期望經由本研究所提出之群集演算法與群集分析應用，以有效且正確地分析出新產品在實際生產中可能會發生的品質缺陷，藉此提早預防以降低生產成本、提高新產品之生產良率。

關鍵字

資料探勘；群集分析；群集演算法；螞蟻演算法； K-Means演算法

並列摘要

Cluster analysis is a technique used to forecast and infer a great deal of data in the domain of data mining. Its major objective is to differentiate the data that have unknown categories. Decision manager can obtain the reference information through the result of cluster analysis. Therefore developing an efficient clustering algorithm is important for many applications. K-Means algorithm is commonly used to conduct clustering task since it can quickly cluster data. However, K-Means algorithm has many drawbacks when used to real world cluster problem. This research combines the concept of traditional clustering algorithm and the technique of ant colony optimization to develop a clustering algorithm that can obtain the global optimization solution. The approach improves the drawback in which K-Means algorithm is easily fall into an awkward situation of the local optimization solution. To demonstrate the benefits of our method, this research experiments several sample data sets. These experiments show that the proposed cluster algorithm can improve the drawback of K-Means algorithm and obtain better cluster objective value and accurate rate. Furthermore, we use product specifications data and production defect data from a practical PCB manufacturer to forecast the defects for a new product. This can prevent and reduce the produce cost and raise the quality of the new product during production.

並列關鍵字

Data Mining ； Cluster Analysis ； Clustering Algorithm ； Ant Colony Optimization ； K-Means Algorithm

參考文獻

2.邱創政，「以消費表現為基礎之顧客群集分析」，碩士論文，元智大學工業工程與管理研究所，民國92年6月。

5.陳麗君，「應用資料探勘技術於信用卡黃金級客戶之顧客關係管理」，碩士論文，元智大學工業工程與管理研究所，民國92年6月。

7.Ankerst M., M. M. Breunig, H-P. Kriegel and J. Sander, “OPTICS: Ordering Points to Identify the Clustering Structure,” SIGMOD, 1999.

8.Agrawal R., J. Gehrke, D. Gunopulos and P. Raghavan, “Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications,” SIGMOD, 1998, pp.94-105.

11.Chinrungrueng C. and C. H. Sequin, “Optimal Adaptive K-Means Algorithm with Dynamic Adjustment of Learning Rate,” IEEE Transactions on Neural Networks, Vol. 6, No. 1, 1995, pp.157-169.

國際替代計量

發展一個以螞蟻理論為基之群集演算法

主題瀏覽