  • 學位論文

不平衡資料問題: 深度判別特徵學習與取樣

Deep Discriminative Features Learning and Sampling for Imbalanced Data Problem

指導教授 : 曾新穆 劉建良




The imbalanced data problem occurs in many application domains and is considered to be a challenging problem in machine learning and data mining. Oversampling may lead to overfitting, while undersampling may discard representative data samples. Additionally, most resampling methods for synthetic data focus on minority class without considering the data distribution of major classes. This paper presents an algorithm that combines feature embedding with the loss functions from discriminative feature learning in deep learning to generate synthetic data samples. In contrast to previous works, the proposed method considers both majority classes and minority classes to learn feature embeddings and utilizes appropriate loss functions to make feature embedding as discriminative as possible. The proposed method is a comprehensive framework and different feature extractors can be utilized for different domains. We conduct experiments utilizing eight numerical datasets and one image dataset based on multiclass classification tasks. The experimental results indicate that the proposed method provides accurate and stable results. Additionally, we thoroughly investigate the proposed method and utilize a visualization technique to determine why the proposed method can generate good data samples.


[1] Qiang Yang and Xindong Wu. “10 challenging problems in data mining research”.
In: International Journal of Information Technology & Decision Making 5.04 (2006),
pp. 597–604.
[2] Yanmin Sun, Andrew KC Wong, and Mohamed S Kamel. “Classification of imbalanced
data: A review”. In: International Journal of Pattern Recognition and
