透過您的圖書館登入
IP:18.119.131.178
  • 學位論文

基於不平衡資料分類方法之研究

Research on Classification Method Based on Imbalanced Data

指導教授 : 李維平
本文將於2024/09/02開放下載。若您希望在開放下載時收到通知,可將文章加入收藏

摘要


對於數據的處理方法,各領域都會遇到不同的難題,其中不平衡資料是一項較為棘手的課題。目前學術界有針對多數類的欠採樣,也有針對少數類的過採樣,但只要處理不妥,就容易在欠採樣時造成樣本本身重要資訊遺失,或是在過採樣時造成分類器過擬合。也有不少研究針對分類器進行改良、優化,但資料本身的品質優劣較大程度的影響了分類結果,分類器本身的改良對於分類結果較無顯著的幫助。   本研究結合了SMOTE(Synthetic Minority Oversampling Technique)合成少數法及NearMiss欠採樣法來解決資料不平衡的問題,並和過採樣法、SMOTE法分別建立決策樹分類模型進行比較,最後透過實驗得知使用NMS(NearMiss-2 SMOTE)採樣法在四種不同數據的實驗中皆為最佳採樣方法,在少數類樣本的分類正確率也為各種採樣方法中最高的。

並列摘要


For data processing methods, various fields will encounter different problems, and unbalanced data is a more difficult subject. At present, academia has under-sampling for the majority of classes and over-sampling for the minority classes, but as long as it is not handled properly, it is easy to cause important information about the sample itself to be lost during under-sampling, or to over-fit the classifier during over-sampling. There are also many studies that improve and optimize the classifier, but the quality of the data itself has a greater impact on the classification results, and the improvement of the classifier itself has no significant help to the classification results.   This study combines SMOTE (Synthetic Minority Oversampling Technique) and NearMiss solve the problem of data imbalance, and compare with the oversampling method and SMOTE method to establish the decision tree classification model. Finally, through experiments, it is found that the NMS (NearMiss-2 SMOTE) sampling method is the best in the four different data experiments. The best sampling method, the classification accuracy rate of the minority samples is also the highest among various sampling methods.

參考文獻


中文部分
王珮紋(2012)。利用資料探勘技術建立現金預測模式:決策樹方法之應用。國立中正大學會計與資訊科技研究所碩士論文。
周舒冬、張磊、李麗霞(2008)。基於K近鄰的過抽樣算法在不平衡的醫學資料中的應用。中國衛生統計2008年12月第25卷第6期,頁566-569。
林佳蒨(2012)。支援向量機於不平衡資料類別問題之應用。國立暨南國際大學資訊管理學系碩士論文。
俞允晨(2017)。高維度不平衡資料演算法之變數篩選。淡江大學數學系數學與數據科學碩士論文。

延伸閱讀