  • 學位論文


Applications of Ensemble Learning Bagging Model on Imbalanced Dataset

指導教授 : 陳開煇




When an enterprise attempts to perform classification on its data, the target feature of the data set usually can be divided into several classes and these classes may have huge differences in counts. Well-known examples are credit card fraud detection and diagnosis of diabetes from clinical data. In these examples, typically one of the classes is much more numerous than the others and consequently causing the prediction model to perform much better on the majority class than the minority class. In order to better handle this phenomenon, we experiment with the use of Ensemble Learning to build models, with the goal of improving the accuracy of the resulting model. In this thesis, we will focus on the use of bagging models and their extensions. We will consider the problem of tuning the parameters of the Decision Tree for weak learners, and we will also experiment with the effect of adopting various sampling strategies. The goal is to discover appropriate strategies that we should adopt when using bagging models so as to predict minority classes with better accuracy.


[1] DecisionTreeClassifier方法
https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier. html
[2] 模型評價(一)混淆矩陣不再混淆
[3] 常見評價指標:Accuracy、Precision、Recall、F1、ROC-AUC與PR-AUC
