使用模糊支撐向量機器解決訓練資料不均衡以及偏離雜訊的問題

本篇論文提出以刪減訓練資料配合模糊支撐向量機器來解決不平衡的訓練資料所造成的問題，首先針所有類別的訓練資料進行叢集分類並計算成為支撐向量的機率，隨機刪除不可能成為支撐向量的資料，使之各類別之間的資料數量趨於平衡，接下來依fuzzy k-nearest neighborhood演算法計算訓練資料的隸屬度，判斷暨刪除偏離雜訊，最後再將上述處理後所得之資料重組，並且建構模糊支撐向量機器進行實驗。而本篇論文中採用UCI WCBD( Wisconsin Breast Cancer Dataset)的資料庫進行實驗，而實驗結果透過本篇提出的方法獲得，其結果和其他方法進行比較，分別是None(未經過任何處理)、SOMTE、SBC以及SUNDO，最後比較結果證明本篇的方法優於其他方法。

關鍵字

模糊支撐向量機；不平衡數據

並列摘要

This paper proposed a method that removes the redundant training data in order to retrieve the support vectors and introduces fuzzy support vector machine to solve imbalanced datasets problems. Firstly, all categories of training data were clustered and the probability of training data belongs to support vectors were computing, and then randomly remove the non-support vector so that the number of data in each category was reached balanced. Next, the degrees of membership of training data were calculated by using fuzzy k-nearest neighborhood algorithm, in order to identify and remove the noise. Finally, the data obtained from the above treatment are recombined to construct a fuzzy support vector machine. In this paper, UCI WCBD (Wisconsin Breast Cancer Dataset) repository was selected for the experiment. The experimental results that are achieved by the proposed method were compared to some well know techniques, i.e. the classical SMOTE approach, SBC approach, and SUNDO approach. Experimental results reveal that the proposed approach outperforms with other approaches.

並列關鍵字

Fuzzy Support Vector Machine ； Imbalanced datasets

參考文獻

[1] C. C. Chuang and Z. J. Lee, "Hybrid robust support vector machines for regression with outliers," Applied Soft Computing, vol. 11, 2011, pp. 64-72.

[2] H. Haibo and E. A. Garcia, "Learning from Imbalanced Data," IEEE Trans. on Knowledge and Data Engineering, vol. 21, 2009, pp. 1263-1284.

[3] Z. Lin, Z. Hao, X. Yang, and X. Liu, "Several SVM Ensemble Methods Integrated with Under-Sampling for Imbalanced Data Learning," in Advanced Data Mining and Applications. vol. 5678, R. Huang, Q. Yang, J. Pei, J. Gama, X. Meng, and X. Li, Eds., ed: Springer Berlin Heidelberg, 2009, pp. 536-544.

[4] G. M. Weiss, "Mining with rarity: a unifying framework," SIGKDD Explor. Newsl., vol. 6, 2004, pp. 7-19.

[7] R. Barandela, J. S. Sanchez, V. Garcia, and E. Rangel, "Strategies for learning in class imbalance problems," Pattern Recognition, vol. 36, 2003, pp. 849-851.

國際替代計量

使用模糊支撐向量機器解決訓練資料不均衡以及偏離雜訊的問題

主題瀏覽