透過您的圖書館登入
IP:18.216.190.167
  • 期刊

PERFORMANCE MEASURES IN CLASSIFICATION PROBLEMS WITH CLASS-IMBALANCED DATA

不平衡資料下的分類表現測度探討

摘要


An enormous amount of classification models and many accompanying performance measures have been proposed in the literature. Due to the uniqueness of individual problems, perplexity usually arises from choosing an appropriate measure for a new question. This situation is more complicated when the data sizes are imbalanced among classes and has been viewed as one of 10 challenges in the decision-making-related research. In this paper, we review many popular classification performance criteria and focus on their properties under the situation of imbalanced class sizes in a binary classification.

並列摘要


在文獻中已經提出許多分類模型和用來評估模型效能的表現測度(performance measures)。由於不同分類問題有其獨特性,如何選定適合的測度往往易使人混亂。這樣的情形在不平衡資料(imbalanced data) 的狀況中更為複雜,這問題也已經被決策相關研究列為十大挑戰之一。在此篇文章中,不僅回顧了許多常見的表現準則,並討論它們在不平衡的二元分類資料下的性質。

參考文獻


Altman, D. G.,Bland, J. M.(1994).Statistics notes: Diagnostic tests 2: predictive values.British Medical Journal.309(6947),102.
Blattberg, R.,Kim, B.,Neslin, S.(2010).Database Marketing: Analyzing and Managing Customers.Springer.
Brodersen, K. H.,Ong, C. S.,Stephan, K. E.,Buhmann, J. M.(2010).The balanced accuracy and its posterior distribution.Proceedings of the 2010 20th International Conference on Pattern Recognition.(Proceedings of the 2010 20th International Conference on Pattern Recognition).
Bult, J. R.,Wansbeek, T.(1995).Optimal selection for direct mail.Marketing Science.14(4),378-394.
Burez, J.,Van den Poel, D.(2009).Handling class imbalance in customer churn prediction.Expert Systems with Applications.36(3),4626-4636.

延伸閱讀