CLUSTERING DATA SISWA PUTUS SEKOLAH DENGAN ALGORITMA K-MEANS DAN DBSCAN

Main Article Content

Mochamad Hammam Tegar Utomo

Abstract

Human resource development and the progress of a country depend on primary education. Although the enrollment rate of students in primary school has increased rapidly, the dropout phenomenon at this level is still a major problem in Indonesia. This study uses K-Means clustering and DBScan algorithms to cluster data on the number of dropouts in each city in Indonesia. The dataset used comes from the Ministry of Education and Culture published in 2023. This dataset has city/district variables, the number of elementary school dropouts, the number of junior high school dropouts, the number of high school dropouts, and the number of vocational school dropouts. The method that has the best results is the K-Means algorithm with a value of K=2 with a silhouette value of 0.722. In general, the clustering results show that there are 40 regions with high dropout rates, relatively less than the 474 regions with low dropout rates. Although the difference is quite significant, it may indicate that there is an education gap between regions that results in a sizable difference in values. This research is expected to provide important information for stakeholders.

Article Details

Section
Articles

References

[1] Laila Khoirun Nisa, Tari Fitri Ningsih, Burhanuddin Izzul Salam, F. Fauzi, And Eny Winaryati, “Clustering Model K-Means Pada Kasus Angka Putus Sekolah Tingkatan Sekolah Dasar di Provinsi Jawa Tengah”, Logiclink, Vol. 1, No. 1, Hal. 13–20, Jun. 2024.

[2] Windarto, A.P., Herawan, T., K-Means Algorithm with Rapidminer in Clustering School Participation Rate in Indonesia. In: Ab. Nasir, A.F., Ibrahim, A.N., Ishak, I., Mat Yahya, N., Zakaria, M.A., P. P. Abdul Majeed, A. (Eds) Recent Trends in Mechatronics Towards Industry 4.0. Lecture Notes in Electrical Engineering, Vol 730, Hal. 779–794, 2022.

[3] Ade, Implementasi Kurikulum 2013 Dalam Pembelajaran SD/MI. Prenada Media, 2019.

[4] P. D. Purnasari and Y. D. Sadewo, ‘Strategi Pembelajaran Pendidikan Dasar di Perbatasan Pada Era Digital’, Jurnal Basicedu, Vol. 5, No. 5, Pp. 3089–3100, 2021.

[5] E. S. Dalmaijer, C. L. Nord, And D. E. Astle, Statistical Power for Cluster Analysis’, BMC Bioinformatics, Vol. 23, No. 1, P. 205, 2022.

[6] Kais Ghedira Et Al., “Design and Implementation of a Scalable High-Performance Computing (HPC) Cluster for Omics Data Analysis: Achievements, Challenges and Recommendations in Lmics.,” Gigascience, Vol. 13, Jan. 2024, Doi: Https://Doi.Org/10.1093/Gigascience/Giae060.

[7] [1] Priyanka Nandal, Optimizing Web Search Results for Image. K-Means Clustering Algorithm. GRIN Verlag, 2021.

[8] João Moreira, C. Ponce, And Tomáš Horváth, A General Introduction to Data Analytics. Chichester: Wiley Blackwell, 2019.

[9] Tshepo Chris Nokeri, Data Science Revealed: Feature Engineering, Data Visualization, Pipeline Development, And Hyperparameter Tuning. United States: Apress, 2021.

[10] B. Peter, DATA MINING BUSINESS ANALYTICS: Concepts, Techniques and Applications in Python. S.L.: Wiley-Blackwell, 2020.

[11] M. Cui, “Introduction to the K-Means Clustering Algorithm Based on the Elbow Method,” 2020, Doi: Https://Doi.Org/10.23977/Accaf.2020.010102.

[12] S. Priya and R. Manavalan, “Kmeans-NM-Salpepi: Genetic Interactions Detection Through K-Means Clustering with Nelder-Mead and Salp Optimization Techniques in Genome-Wide Association Studies,” Artificial Intelligence Evolution, Pp. 67–80, Oct. 2021, Doi: Https://Doi.Org/10.37256/Aie.2220211099.

[13] F. Pedregosa Et Al., ‘Scikit-Learn: Machine Learning in Python’, Journal of Machine Learning Research, Vol. 12, Pp. 2825–2830, 2011.

[14] J. Brownlee, Data Preparation for Machine Learning. Machine Learning Mastery, 2020.

[15] K. Jajuga, BatógJ., And M. Walesiak, Classification and Data Analysis: Theory and Applications. Cham: Springer, 2020.

[16] S. Paembonan and H. Abduh, Penerapan Metode Silhouette Coefficient Untuk Evaluasi Clustering Obat’, PENA TEKNIK: Jurnal Ilmiah Ilmu-Ilmu Teknik, Vol. 6, P. 48, 09 2021.

[17] C. Fan, M. Chen, X. Wang, J. Wang, And B. Huang, ‘A Review on Data Preprocessing Techniques Toward Efficient and Reliable Knowledge Discovery from Building Operational Data’, Frontiers in Energy Research, Vol. 9, P. 652801, 2021.

[18] K. R. Shahapure and C. Nicholas, ‘Cluster Quality Analysis Using Silhouette Score’, In 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), 2020, Pp. 747–748.

[19] M. Shutaywi and N. N. Kachouie, Silhouette Analysis for Performance Evaluation in Machine Learning with Applications to Clustering’, Entropy, Vol. 23, No. 6, P. 759, 2021.

[20] K. R. Shahapure and C. Nicholas, ‘Cluster Quality Analysis Using Silhouette Score’, In 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), 2020, Pp. 747–748.

[21] A. Dudek, ‘Silhouette Index as Clustering Evaluation Tool’, In Classification and Data Analysis: Theory and Applications 28, 2020, Pp. 19–33.

[22] M. Shutaywi and N. N. Kachouie, Silhouette Analysis for Performance Evaluation in Machine Learning with Applications to Clustering’, Entropy, Vol. 23, No. 6, P. 759, 2021.

[23] D. Deng, ‘DBSCAN Clustering Algorithm Based on Density’, In 2020 7th International Forum on Electrical Engineering and Automation (IFEEA), 2020, Pp. 949–953.

[24] J. Brownlee, Data Preparation for Machine Learning. Machine Learning Mastery, 2020.

[25] K. R. Shahapure and C. Nicholas, ‘Cluster Quality Analysis Using Silhouette Score’, In 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), 2020, Pp. 747–748