Classification for Early Detection of High School Students Vulnerable to Poor Academic Performance

This paper aims to analyze various student demographic information. The purpose of doing so is to provide educational policy makers and educational institutions with meaningful high-level information on the students. This information will allow for the proactive intervention of identifying and providing additional support to high school students who are highly vulnerable to failing. This research provides a tool to predict and compare students’ academic performance using factors that are independent of the school they attended or the grades they had before. The models in this research identified two groups of factors that are significant determinants of a student’s academic performance: (1) Factors that relate to the parents (parents education, resources available at home, paid extra classes, address, student willingness to pursue higher education), and (2) Factors that relate to the student as an individual (study time, free time, weekly alcoholic consumption). Four models were applied: Logistic Regression using PCA Scores, Logistic Regression using Stepwise implementation, Decision Tree, and Random Forrest. Analysis of the dataset showed that there was a high correlation among many of the independent variables, which was the most common drawback among the different models applied. PCA allowed us to identify orthogonal components that were used as the independent variables in the logistic regression (using PCA scores from the components) model. This method proved to produce the best results. This model yielded a sensitivity Although it is not feasible to change the significant factors that relate to the parents, policy makers or the school can fill in the gap by providing some alternative form of teaching methodology and assessment to the students identified as those vulnerable to fail. Remedial classes can also be offered. These interventions can train and encourage students to prevail notwithstanding the factors that have been identified as those that have a significant effect on their grade.

關鍵字

学生表现；数据挖掘；分类

並列摘要

並列關鍵字

Student Performance ； Datamining ； Classification

參考文獻

[3] Batista, G. E. A. P. A., Prati, R. C., and Monard, M. C., “A study of the behavior of several methods for balancing machine learning training data”. SIGKDD Explorations, 6(1), 2004.

[5] Kotsiantis A, “Predicting Dropout Student: An Application of Data Mining Methods in an Online Education Program”, 2004.

Reference