  • 期刊


An Application of Machine Learning Classification Algorithms in Prediction of Binary Imbalanced Data




It is an important issue for machine learning algorithms to deal with the imbalanced data in predicting the minority category. This study aims to explore the classification performance of binary imbalanced data based on Synthetic Minority Oversampling Technique (SMOTE) resampling algorithms and ensemble learning techniques. In this study, three resampling methods, namely SMOTE, Borderline-SMOTE and Support Vector Machine Synthetic Minority Oversampling Technique (SVM-SMOTE) integrated with ensemble learning techniques are presented and compared for their classification performance based on an empirical analysis of three binary imbalanced datasets. Two ensemble learning techniques, Random Forest (RF) and eXtreme Gradient Boosting (XGBoost) are selected as the classification models. The Average Precision (AP) and the Area under the Curve of ROC (AUC) are used to evaluate the classification performance of the models. Study results show that the SVM-SMOTE method can improve predictive ability of minority categories. Moreover, the RF performs better than the XGBoost for classifying binary imbalanced data. In summary, the hybrid model that combines the SVM-SMOTE resampling method with the RF classification model has the best performance for predicting binary imbalanced data and can be used to improve the classification accuracy of the minority category. Therefore, the hybrid model is suggested for dealing with the class imbalanced problem.
