透過您的圖書館登入
IP:18.219.68.172
  • 學位論文

適用於分類變數資料的二元不平衡資料自動分類系統

Automatic Binary Classification System for Imbalanced Data with Categorical Explanatory Variables

指導教授 : 陳景祥

摘要


隨著科技的進步,許多產業都應用自動化的作業模式,使得現今人類的生活更便利也更有效率。若我們能將自動化的概念導入資料分析的流程中,便能使資料分析者在作業上的負擔降低。本研究參考了資料複雜度指標對常見分類技術的影響,針對二元分類的類別不平衡資料,使用三種不同的重抽樣方法對資料進行類別的平衡,期望能夠建立一個有效的類別不平衡資料自動二元分類系統。研究結果顯示,本文提出的方法能夠有效的選出最好的分類技術。整體而言,羅吉斯迴歸在二元分類不平衡問題有較好的表現。

並列摘要


As technology advances, automated operations are used by many industries, it makes human life much easier and more efficient. Automated operations will reduce the burden on the data analyst if the concept of automation can be imported into the data analysis. In this study, influences of data complex indices on common classifier are evaluated and three different re-sampling methods are used for binray imbalanced data. The results show that our proposed procedure can effectively select the best classifier. For binary classification of imbalanced data, the Logistic regression has a better performance.

參考文獻


2. 沈彥廷(2012),「資料複雜度指標對資料探勘分類技術的影響」,淡江大學統計學系應用統計學碩士班碩士論文。
1. Friedman, J.H. and Rafsky, L.C. (1979), Multivariate Generalizations of the Wald-Wolfowitz and Smirnov Two-Sample Tests, The Annals of Statistics, 7, 697-717.
2. Giovanna Menardi and Nicola Torelli (2014), Training and assessing classification rules with imbalanced data. Data Mining and Konwledge Discovery 28, 92–122.
4. Kalousis, A., Gama,J. and Hilario, M. (2004), On data and algorithms: understanding inductive performance, Machine Learning, 54, 275-312.
5. Loh, W. Y., & Shih, Y, S. (1997). Split selection methods for classification trees. Statistica sinica, 7, 815-840

延伸閱讀