應用蟻群最佳化於支援向量機之研究

支援向量機(support vector machine, SVM)是資料分類的一項新穎技巧，已被廣泛地應用於許多不同領域中，例如：生物資訊學、文字分類以及資料辨識等等。然而，資料庫的屬性數量往往過多，增加分類的困難度，因而屬性挑選在分類問題中是一項重要的步驟。先從資料庫之屬性集合中挑選一組屬性子集合，以此被挑選之屬性子集合就執行分類任務；另外，屬性子集合也會影響執行時間和分類準確性。故本研究之目的是探討如何挑選出最好的屬性子集合並減少分類之誤判率。　　本研究提出以蟻群最佳化(Ant Colony Optimization)演算法做為屬性挑選。蟻群最佳化是一種啟發式之演算法，其原理是依據真實螞蟻在覓食時尋找最短路徑，而所走過的路徑上會留下一種化學物質，費洛蒙(Pheromone)，我們則設定當屬性聚集的費洛蒙越多，則該屬性被挑選的機會就越高；之後將挑選出來之子集合，透過支援向量機進行分類並評估誤判率，我們稱此混合式的方法為ACO-SVM。本研究利用兩個信用風險資料庫的資料驗證本研究所提出混合的模式。結果顯示我們提出之方法能有效地改善誤判率。　　本研究亦討論不同的訓練樣本比例和分割資料的先後順序，對分類的影響。結果顯示訓練樣本比例與誤判率有正向關聯，以及後分割資料比先分割資料成效較佳。此外，加入適當的區域搜尋規則能夠有效地降低分類之誤判率。

關鍵字

分類、支援向量機、屬性挑選、蟻群最佳化、資料探勘

並列摘要

Recently, support vector machine (SVM), one of the novel techniques for pattern classification, has been widely applied in various fields, such as bioinformatics, text categorization, and so on. However, enormous different in datasets features may increase the difficulty of classification. Furthermore, the subset of features will impact on executive time and accuracy. Thus, a feature selection is an important step in pattern classification problems. A set of selected features is followed by the classification procedure. The purpose of this thesis concerns how to select one of the best subset of features to reduce the error of classification. In this study, we propose a feature selection algorithm based on the Ant Colony Optimization (ACO). The ACO which is a simulator on the behavior of ants in their searching shortest paths to food sources is a metaheuristic algorithm. The ants will leave chemistry called Pheromone on their track. The higher pheromones is aggregated in feature, the more probability the feature will be selected. Following the selection of features, the selected feature-subset is classified and evaluated the error by SVM. This hybrid method is named as ACO-SVM. We apply two real-world datasets which are from the domain of credit risk to verify the proposed hybrid model. The result shows that the proposed method can improve the error efficiently. For studying the influence of classification, we discuss several different rates of training sample and the different sequence of separating data in this study as well. The result displays that the accuracy is positively related to the rate of the training sample, and the accuracy of the post-separated data is better than pre-separated ones. Moreover, adding some suitable rules of local search can auxiliarily diminish the error of classification.

並列關鍵字

Classification ； Support vector machine ； Feature selection ； Ant colony optimization ； Data mining

參考文獻

2.Bello, R., Puris, A., Nowe, A., et al. “Two step ant colony system to solve the feature selection problem”, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 4225 LNCS, pp. 588-596, 2006.

3.Chang, C. C. and Lin, C. J. LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm, 2001.

4.Cover, T. M. and Thomas, J. A. Elements of Information Theory, John Wiley & Sons, 1991.

5.Cristianini, N., and Shawe-Taylor, J. An introduction to support vector machines and other kernel-based learning methods, Cambridge University Press, 2000.

7.Dorigo, M., and Gambardella, L. M. “Ant colony system: A cooperative learning approach to the traveling salesman problem”, IEEE Transactions on Evolutionary Computation 1 (1), pp. 53-66, 1997.

國際替代計量

應用蟻群最佳化於支援向量機之研究

全文下載

主題瀏覽