透過您的圖書館登入
IP:216.73.216.221
  • 期刊

Tree-Based Ensemble Classifiers for High-Dimensional Data

並列摘要


Building a classification model from thousands of available predictor variables with a relatively small sample size presents challenges for most traditional classification algorithms. When the number of samples is much smaller than the number of predictors, there can be a multiplicity of good classification models. An ensemble classifier combines multiple single classifiers to improve classification accuracy. This paper overviews tree-based classifiers and compares the performance of the three ensemble classifiers: random forest (RF), classification by ensembles from random partitions (CERP), and adaptive boosting (AdaBoost), and three single tree algorithms are also evaluated, classification tree (CTree), classification rule with unbiased interaction selection and estimation (CRUISE), and quick, unbiased and efficient statistical tree (QUEST). The six tree-based classifiers are applied to five high-dimensional datasets. In all datasets, the three ensemble classifiers show much higher classification accuracies than the three single tree algorithms, with the exception of the AdaBoost ensemble classifier in one dataset. RF and CERP are comparable in terms of accuracy. The RF and CERP bagging classifiers show higher accuracies than the AdaBoost boosting classifier. For the three tree classifiers, QUEST generally shows higher accuracy than CTree and CRUISE.

參考文獻


Ahn, H.,Moon, H.,Fazzari, M.J.,Lim, N.,Chen, J.J.,and Kodell, R.L.(2007).Classification by ensembles from random partitions of high-dimensional data.Computational Statistics and DATA Analysis.51
Alizadeh, A.A,Eisen M.B.,Davis, R.E.,Ma, M.C.,Lossos, I.S.,Rosenwald, A.,Boldrick J.C.,Sabat, H.,Tran, T.,Yu, X.,Powell, J.I.,Yang, L.,Marti, G.E.,Moore, T.,Hudson, J.,Lu, L.,Lewis, D. B.,Tibshirani, R.,Sherlock, G.,Chan, W.C.,Greiner, T.C.,Weisenburger, D.D.,Armitage, J.O.,Warnke, R.,Levy, R.,Wilson, W.,Grever, M.R.,Byrd, J.C.,Botstein, D.,Brown, P.O.,Staudt, L.M.(2000).Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling.Nature.403,503-511.
Alon, U.,Barkai, N.,Notterman, D.A.,Gish, K.,Ybarra, S.,Mack, D.,Levine, A.J.(1999).Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays.Proc. Natl. Acad. Sci. USA.96,6745-6750.
Bauer, E.,Kohavi, R.(1999).An empirical comparison of voting classification algorithms: Bagging, boosting, and variants.Mach. Learn..36,105-139.
Blair, R.,Fang, H.,Branham, W.S.,Hass, B.,Dial, S.L.,Moland, C.L.,Tong, W.,Shi, L.,Perkins, R.,Sheehan, D.M.(2000).Estrogen receptor relative binding a?nities of 188 natural and xenochemicals: structural diversity of ligands.Toxicol. Sci..54,138-153.

被引用紀錄


Hou, T. C. (2016). 最佳優先及同心球樹:優化球樹在最近鄰居法的效能 [master's thesis, National Taiwan University]. Airiti Library. https://doi.org/10.6342/NTU201601731
Lin, Y. C. (2013). 於機率性資料庫中選擇具影響力物件之技術 [doctoral dissertation, National Taiwan University]. Airiti Library. https://doi.org/10.6342/NTU.2013.02048
王乃雯(2008)。飄泊中的依歸:從「家」看雲南Hmongb人的社會關係〔碩士論文,國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU.2008.01013

延伸閱讀