運用基因演算法發展差異性極大化之集成式分類器

分類方法為資料探勘的主要內容之一，在過去文獻中，常被提出來使用的分類器如決策樹、類神經網路…等，皆屬獨立型分類器。近年來，多位學者指出由多個獨立分類器結合而成的集成式分類器被認為能比獨立分類器有更好的分類效果，而集成式分類器主要的分類方式是整合各獨立分類器的輸出結果並得到最後的決策，因此分類器間是否具有差異就成為影響分類效果的重要因素。由於差異性被認為對分類正確率有所影響，且在過去研究中少有極大化差異性的研究，因此本研究採用操控訓練樣本產生差異性的方式提出一個運用基因演算法極大化分類器間差異性的集成式演算法（DECRTS）來提高分類器的差異性，同時與文獻中具代表性的集成式分類方法以21個UCI案例資料集實驗並分析比較。實驗結果顯示，本研究提出的DECRTS在研究使用的6種演算法中具有較佳的分類正確率平均值（82.19%），並具統計上的顯著性，亦表示DECRTS能使多數的研究資料集在分類正確率獲得改善。此外，由實驗結果亦可發現若使用不同形式產生差異性的方法，在特定資料集會有較優異的分類結果。

關鍵字

集成式分類器；基因演算法；決策樹；資料探勘

並列摘要

Data classification method is one of the main tasks of data mining. In the literature, there are many classic base inducers used to train the classifier such as decision tree, neural network…etc., which are all individual classifier. In the past few years, many researches have proposed that the classifier ensemble, which composed by more than one individual classifier, is more effective than any individual classifier of the classifier ensemble. The main idea for classifier ensemble to classify a new sample is to combine the output of each individual classifier and then reach the final decision. Therefore, the diversity between the classifiers is considered as an important factor in classification accuracy. Because there are few literatures to research about how to optimize the diversity, this paper would propose an ensemble method（Diversity by evolutionary computing resampling training subset, DECRTS）that uses the genetic algorithm to encourage the diversity between classifiers by manipulating the train data set. We design an experiment using 21 UCI Repository of machine learning databases to test and verify and then comparing with individual classifier and other classifier ensembles. The result provides that the DECRTS in our experiment has better average accuracy（82.19%）and is significantly difference with other method except Adaboost（81.99%）. Moreover, the experiment appears the different method to create diversity sometimes would have better performance in particular datasets.

並列關鍵字

Classifier Ensemble ； Genetic algorithm ； Decision Tree ； Data Mining

參考文獻

[2] Breiman, L., 1996, “Bagging predictors,” Machine Learning, Vol. 24, No.2, pp.123-140.

[6] Kuncheva, L.I., 2005, “Using diversity measures for generating error-correcting　output codes in classifier ensembles,” Pattern Recognition Letters, Vol. 26, pp.83-90.

[7] Kuncheva, L. and Whitaker, C., 2003, “Measures of diversity in classifier ensembles and their relationship with ensemble accuracy,” Machine Learning, pp.181-207.

[8] Kuncheva, L.I., 2005, “Diversity in multiple classifier systems,” Information Fusion, Vol. 6, No.1, pp.3-4.

[9] Hu, X., 2001, “Using Rough Sets Theory and Database Operations to Construct a Good Ensemble of Classifiers for Data Mining Applications,” ICDM2001, Proceedings IEEE International Conference, pp.233-240.

被引用紀錄

林千淵（2014）。運用適應性推進演算法增進分類準確度之研究〔碩士論文，國立虎尾科技大學〕。華藝線上圖書館。https://doi.org/10.6827/NFU.2014.00111

國際替代計量

運用基因演算法發展差異性極大化之集成式分類器

主題瀏覽