透過您的圖書館登入
IP:3.17.9.118
  • 學位論文

以虛擬多類別方式處理不平衡資料

A Virtual Multi-label Approach to Imbalanced Data Classification

指導教授 : 周珮婷
本文將於2025/08/18開放下載。若您希望在開放下載時收到通知,可將文章加入收藏

摘要


大多數監督式學習方法對於不平衡資料的分類預測,在建構演算法的過程中,會以多數類別當作主要學習對象,因而犧牲少數類別,使分類器的性能下降。基於上述問題,本研究使用一個新的分類方法,結合Equal Kmeans的分群方式,以虛擬多類別來處理不平衡的問題,並且與常用的處理方式,包括抽樣方法中的過度抽樣、低額抽樣及SMOTE;分類器方法中的SVM及One-Class SVM進行比較。研究結果顯示本研究方法隨著資料不平衡程度的上升,會有越好的表現,且逐漸優於其他方法。

並列摘要


To predict the classification of imbalanced data, most of the supervised learning methods will use the majority class as the main learning object to develop a learning algorithm. Therefore, it would lose the information on the minority class and reduce the performance of the classifier. Based on the problem above, a new classification approach with the Equal Kmeans clustering method is proposed in the study. The proposed virtual multi-label approach is used to solve the imbalanced problem. The proposed method is compared with the commonly used imbalance problem methods, such as sampling methods (oversampling, undersampling, and SMOTE) and classifier methods (SVM and One-Class SVM). The result shows that the proposed method will have better performance when the degree of data imbalance increases, and it will gradually outperform other methods.

參考文獻


Akbani, R., Kwek, S., & Japkowicz, N. (2004). Applying support vector machines to imbalanced datasets. Paper presented at the European conference on machine learning.
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297.
Davis, J., & Goadrich, M. (2006). The relationship between Precision-Recall and ROC curves. Paper presented at the Proceedings of the 23rd international conference on Machine learning.
Ertekin, S., Huang, J., Bottou, L., & Giles, L. (2007). Learning on the border: active learning in imbalanced data classification. Paper presented at the Proceedings of the sixteenth ACM conference on Conference on information and knowledge management.

延伸閱讀