A Comparative Study of Machine Learning Techniques for College Student Success Prediction
DOI:
https://doi.org/10.33423/jhetp.v24i1.6764Keywords:
higher education, student success, prediction, model comparison, logistic regression, random forestAbstract
The study aims to compare the performance of various machine learning models for student persistence prediction. The research starts with a historical review of student retention studies and the evolution of predictive models in the field. It highlights the importance of predicting student persistence for educational institutions and individuals. It then describes a dataset from ResearchGate, consisting of anonymized undergraduate student data collected between 2008 and 2018, with 37 features and 4,424 records. Ten machine learning algorithms are considered, with two popular machine learning algorithms, Logistic Regression, and Random Forest classification, being compared in more detail for their performance in predicting student persistence. Evaluation metrics such as prediction accuracy, precision, recall, and F1-score are used. Results show that the Random Forest model outperforms Logistic Regression in predicting student outcomes, particularly when using the synthetic minority oversampling technique (SMOTE) to address the class imbalance. Overall, this study contributes to student retention research and provides insights for developing targeted support measures to enhance student success in higher education.