應用集成式學習演算法增進分類準確度之研究

近年資料探勘技術常被運用在各大領域之中，而探勘的工具與使用方法也常推陳出新，較常見的挖掘方式為單一分類器模型與兩種分類器之結合模型，而近期結合多種分類器的集成式演算法更被廣泛探討，因此本研究以集成式演算法中的Bagging作為分類評估的方法，此方法是用多個基底分類器分類，再以相同權重的方式投票，整合出一個較佳的模型。目的是比較以往單一分類器與結合多層基底分類器之優劣。本研究使用四種方法分別為決策樹中的C4.5/J48方法、簡單貝式、支援向量機與倒傳遞類神經網路做為單一分類器，並使用Bagging將此四種單一分類器分別做為基底分類器與四種方法的相互結合，總共分為五大類模型。使用UCI四個資料庫作為實驗的測試與評估，應用軟體為開放式的數據挖掘工具WEKA進行測試。其研究結果顯示使用集成式演算法優於沒有使用集成式演算法；且使用多種不同基底分類器則會有較佳的準確度。

關鍵字

資料探勘；集成式演算法；套袋法；支援向量機；倒傳遞類神經網路

並列摘要

The Data Mining technologies have been applied in major scientific fields in past few years. The Data Mining methodology and usage have been increasingly evolving. The most frequent Data Mining methodology shall be the portfolio composed of single classifier and dual classifier. The multipleclassifier, which characters combination of algorithm, has been extensively explored. There has applied the Bagging in the Unification Algorithm in classifier evaluation. This methodology features diverse basal classifier and then usage of the equal weighted-average ratios for voting for integrating a better model. Its purpose serves the superiority and inferiority between single classifier instrument and multiple basal counterparts. There has adopted in this research 4 methodologies resulted from the Decision-Making Tree, C4.5/J48, Naive Bayes Model, Support Vector Machine, and Back-Propagation Neural Network , serving as a single classifier and the usage of the Bagging for integrating such 4 single classifier instrument with 4 methodologies, amounting to 5 major models. The usage of UCI, 4 database for experimental testing and evaluation. The WEKA, data mining with open source machine learning software, has been adopted in this research. The research results indicate the usage of the ensemble algorithm is better than without the usage. The usage of various basal classifiers would result in better precision.

並列關鍵字

Data Mining ； Ensemble ； Bagging ； Support Vector Machine ； Back-Propagation Network

參考文獻

[1]Altman, E. I., Marco, G. V. and Varetto, F., 1994, “Corporate distress diagnosis: comparisons using linear discriminant analysis and neural networks”, Journal of Banking and Finance, Vol. 18, pp. 505-529.

Google Scholar

[2]Odom, M., and Sharda, R, 1990, “A neural network model for bankruptcy prediction”, Proceeding of International Joint Conference on Neural Networks, pp. 163-168.

Google Scholar

[3]Shin, K. S., Lee, T.S. and Kim, H.J., 2005, “An application of support vector machines in bankruptcy prediction model”, Expert Systems with Applications, Vol. 28, No.1, pp. 127-135.