A platform for research: civil engineering, architecture and urbanism
Prediction of Undergraduate Student’s Study Completion Status Using MissForest Imputation in Random Forest and XGBoost Models
The number of higher education graduates in Indonesia is calculated based on their completion status. However, many undergraduate students have reached the maximum length of study, but their completion status is unknown. This condition becomes a problem in calculating the actual number of graduates as it is used as an indicator of higher education evaluation and other policy references. Therefore, the unknown completion status of the students who have reached the maximum length of study must be predicted. The research compared the performance of Random Forest and Extreme Gradient Boosting (XGBoost) classification models in predicting the unknown completion status. The research used a dataset containing 13.377 undergraduate students’ profiles from the Higher Education Database (PDDikti), Ministry of Education, Culture, Research, and Technology. The dataset was incomplete, and the proportion of missing data was 20,9% of the total data. Because missing data might lead to prediction bias, the research also used MissForest imputation to overcome the missing data in the classification modelling and compared it to Mean/Mode and Median/Mode imputation. The results show that MissForest outperforms the other two imputations in both classifiers but requires the longest computation time. Furthermore, the XGBoost model with MissForest is significantly superior to the Random Forest model with MissForest. Hence, the best model chosen to predict the completion status is XGBoost with MissForest imputation.
Prediction of Undergraduate Student’s Study Completion Status Using MissForest Imputation in Random Forest and XGBoost Models
The number of higher education graduates in Indonesia is calculated based on their completion status. However, many undergraduate students have reached the maximum length of study, but their completion status is unknown. This condition becomes a problem in calculating the actual number of graduates as it is used as an indicator of higher education evaluation and other policy references. Therefore, the unknown completion status of the students who have reached the maximum length of study must be predicted. The research compared the performance of Random Forest and Extreme Gradient Boosting (XGBoost) classification models in predicting the unknown completion status. The research used a dataset containing 13.377 undergraduate students’ profiles from the Higher Education Database (PDDikti), Ministry of Education, Culture, Research, and Technology. The dataset was incomplete, and the proportion of missing data was 20,9% of the total data. Because missing data might lead to prediction bias, the research also used MissForest imputation to overcome the missing data in the classification modelling and compared it to Mean/Mode and Median/Mode imputation. The results show that MissForest outperforms the other two imputations in both classifiers but requires the longest computation time. Furthermore, the XGBoost model with MissForest is significantly superior to the Random Forest model with MissForest. Hence, the best model chosen to predict the completion status is XGBoost with MissForest imputation.
Prediction of Undergraduate Student’s Study Completion Status Using MissForest Imputation in Random Forest and XGBoost Models
Intan Nirmala (author) / Hari Wijayanto (author) / Khairil Anwar Notodiputro (author)
2022
Article (Journal)
Electronic Resource
Unknown
Metadata by DOAJ is licensed under CC BY-SA 1.0
Real-time prediction of tunnel face conditions using XGBoost Random Forest algorithm
Springer Verlag | 2023
|Real-time prediction of tunnel face conditions using XGBoost Random Forest algorithm
Springer Verlag | 2023
|Emerald Group Publishing | 2021
|Soil Erosion Status Prediction Using a Novel Random Forest Model Optimized by Random Search Method
DOAJ | 2023
|