PERBANDINGAN KINERJA METODE KLASIFIKASI UNTUK MEMPREDIKSI PUTUS SEKOLAH DAN KEBERHASILAN AKADEMIK SISWA
Main Article Content
Abstract
Dropping out of school and students' academic success are two important things in education. This study compares the performance of classification methods for predicting school dropout and student academic success. The classification methods used are Random Forest Classifier, AdaBoost, Decision Tree, Logistic Regression, and XGBoost. The dataset used comes from universities which has 4424 samples with 36 features and 3 classes. The research results show that the Random Forest Classifier method has the best performance with an accuracy of 76%, followed by XGBoost 76%, AdaBoost 74%, Logistic Regression 74%, and Decision Tree 71%. Therefore, the Random Forest Classifier method can be used to predict school dropout and student academic success more accurately However, it should be noted that although all classification methods used in this study have experienced performance improvements through the use of ADASYN techniques and parameter tuning, they still face challenges in accurately identifying cases within one of the minority classes. Therefore, the next step that needs to be taken is to carry out further research to optimize parameters more carefully and also consider other approaches that can further improve model performance, such as considering the addition of additional information that may be present in the dataset.
Article Details
Section

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Menawarkan akses terbukaReferences
[1] D. and M. J. and B. L. M. T. and R. V. Martins Mónica V. and Tolledo, 2021, Early Prediction of student’s Performance in Higher Education: A Case Study, Trends and Applications in Information Systems and Technologies, vol. 1, hal. 166–175.
[2] G. Lemaître, F. Nogueira, dan C. K. Aridas, 2017, Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning, Journal of Machine Learning Research, vol. 18, no. 17, hal. 1–5, [Daring]. Tersedia pada: http://jmlr.org/papers/v18/16- 365.html
[3] Haibo He, Yang Bai, E. A. Garcia, dan Shutao Li, 2008, ADASYN: Adaptive synthetic sampling approach for imbalanced learning, dalam 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), IEEE, hal. 1322–1328. doi: 10.1109/IJCNN.2008.4633969.
[4] F. Pedregosa dkk., 2011, Scikit-learn: Machine Learning in Python, Journal of Machine Learning Research, vol. 12, hal. 2825–2830.
[5] K. Lu, Logistic Regression in Biomedical Study, 2022, 2022 International Conference on Biotechnology, Life Science and Medical Engineering (BLSME 2022), [Daring]. Tersedia pada: https://api.semanticscholar.org/CorpusID:248935866
[6] J. Friedman, T. Hastie, dan R. Tibshirani, 2000,Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors), The Annals of Statistics, vol. 28, no. 2, doi: 10.1214/aos/1016218223.
[7] M. Zanchak, V. Vysotska, dan S. Albota, 2021, The Sarcasm Detection in News Headlines Based on Machine Learning Technology, dalam 2021 IEEE 16th International Conference on Computer Sciences and Information Technologies (CSIT), IEEE, hal. 131–137. doi: 10.1109/CSIT52700.2021.9648710.
[8] Y. Yue, L. Jia, H. Zhai, M. Kong, dan M. Li, 2020, CFS-DT: a Combined Feature Selection and Decision Tree based Method for Octane Number Prediction, dalam 2020 4th Annual International Conference on Data Science and Business Analytics (ICDSBA), IEEE, hal. 100–103. doi: 10.1109/ICDSBA51020.2020.00033.
[9] J. R. Quinlan, 1986, Induction of decision trees, Mach Learn, vol. 1, no. 1, hal. 81–106, doi: 10.1007/BF00116251.
[10] E. Momeni, M. R. Sahebi, dan A. Mohammadzadeh, 2020, Classification Of High-Resolution Satellite Images Using Fuzzy Logics Into Decision Tree, Malaysian Journal of Geosciences, vol. 4, no. 1, hal. 07–12, doi: 10.26480/mjg.01.2020.07.12.
[11] L. Wang dan Y. Zhang, 2020, Clustering Reduction Method Analysis of Rough Set and Decision Tree based on Weight Matrix Analysis, IOP Conf Ser Mater Sci Eng, vol. 750, no. 1, hal. 012205, doi: 10.1088/1757-899X/750/1/012205.
[12] N. Nakaryakova, S. Rusakov, dan O. Rusakova, 2020, Prediction Of The Risk Group (By Academic Performance) Among First Course Students By Using The Decision Tree Method, Applied Mathematics and Control Sciences, no. 4, hal. 121–136, doi: 10.15593/2499-9873/2020.4.08.
[13] S. Abdullah dan G. Prasetyo, 2020, Easy Ensemmble with Random Forest To Handle Imbalanced Data In Classification, Journal of Fundamental Mathematics and Applications (JFMA), vol. 3, no. 1, hal. 39–46, doi: 10.14710/jfma.v3i1.7415.
[14] L. Breiman, Random Forests, Mach Learn, 2001, vol. 45, no. 1, hal. 5–32, doi: 10.1023/A:1010933404324.
[15] C. Han dan H. Jia, 2022, Multi-Modal Representation Learning with Self-Adaptive Thresholds for Commodity Verification, dalam China Conference on Knowledge Graph and Semantic Computing.
[16] Z. Zheng dan Y. Yang, 2021, Adaptive Boosting for Domain Adaptation: Toward Robust Predictions in Scene Segmentation, IEEE Transactions on Image Processing, vol. 31, hal. 5371– 5382.
[17] N. A. Akbar, A. Sunyoto, M. Rudyanto Arief, dan W. Caesarendra, 2020, Improvement of decision tree classifier accuracy for healthcare insurance fraud prediction by using Extreme Gradient Boosting algorithm, dalam 2020 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS), IEEE, hal. 110–114.
[18] M. Alqahtani, H. Mathkour, dan M. M. Ben Ismail, 2020, IoT Botnet Attack Detection Based on Optimized Extreme Gradient Boosting and Feature Selection, Sensors, vol. 20, no. 21, hal. 6336, doi: 10.3390/s20216336.
[19] Z. Yan dan H. Wen, 2020, Electricity Theft Detection Base on Extreme Gradient Boosting in AMI, dalam 2020 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), IEEE, hal. 1–6.
[20] T. Chen dan C. Guestrin, 2016, XGBoost, dalam Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA: ACM, hal. 785–794.
[21] D. Johannßen, C. Biemann, S. Remus, T. Baumann, dan D. Scheffer, 2020, GermEval 2020 Task 1 on the Classification and Regression of Cognitive and Motivational Style from Text: Companion Paper.