KLASIFIKASI TWEET YANG MENGANDUNG UJARAN KEBENCIAN DENGAN XGBOOST DAN LOGISTIC REGRESSION
Main Article Content
Abstract
Nowadays, people use social media as a platform to express their opinions. There are various methods that can be used to voice views, both positive and negative. The high number of social media users also provides greater opportunities for content containing hate speech to appear, including on platforms such as Twitter. Twitter is a social media platform that facilitates users to convey feelings and opinions through tweets, including tweets that have the potential to contain hate speech. Hate speech is an act of communication that includes provocation, incitement, or insults against individuals or groups based on factors such as ethnicity, religion, race, nationality, and others. To obtain information and classify text, sentiment analysis is needed. In the context of this research, sentiment analysis is a process of classifying text documents into two classes, namely negative and positive sentiment classes. In this study, we compare two different classification methods, namely Logistic Regression and Extreme Gradient Boosting (XGBoost). This research used 31962 training data and 17197 test data. This research succeeded in obtaining the best Logistic Regression model with an accuracy rate of 95.74%. Meanwhile, XGBoost shows no less high accuracy results of 94.97%. Based on the results of the research that has been carried out, it can be concluded that the logistic regression algorithm is the most effective method for classifying hate speech in Twitter text. This is proven by the accuracy results obtained, namely 95.74%.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Menawarkan akses terbukaReferences
[1] H. Nurrahmi and D. Nurjanah, "Indonesian Twitter Cyberbullying Detection using Text Classification and User Credibility," 2018 International Conference on Information and Communications Technology (ICOIACT), Yogyakarta, Indonesia, 2018, pp. 543-548, doi: 10.1109/ICOIACT.2018.8350758.
[2] A. Pravina, I. Cholissodin, and P. Adikara, “Tampilan Analisis Sentimen Tentang Opini Maskapai Penerbangan pada Dokumen Twitter Menggunakan Algoritme Support Vector Machine (SVM),” Ub.ac.id, 2023. https://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/4793/2232 (accessed Dec. 05, 2023).
[3] D. J. Ningrum, S. Suryadi, and D. E. Chandra Wardhana, “KAJIAN UJARAN KEBENCIAN DI MEDIA SOSIAL,” Jurnal Ilmiah KORPUS, vol. 2, no. 3, pp. 241–252, Feb. 2019, doi: https://doi.org/10.33369/jik.v2i3.6779.
[4] D. Indonesia, “Pengguna Twitter di Indonesia Capai 18,45 Juta pada 2022,” Dataindonesia.id, Aug. 10, 2022. https://dataindonesia.id/internet/detail/pengguna-twitter-di-indonesia-capai-1845-juta-pada-2022
[5] F. A. Prabowo, M. O. Ibrohim and I. Budi, "Hierarchical Multi-label Classification to Identify Hate Speech and Abusive Language on Indonesian Twitter," 2019 6th International Conference on Information Technology, Computer and Electrical Engineering (ICITACEE), Semarang, Indonesia, 2019, pp. 1-5, doi: 10.1109/ICITACEE.2019.8904425.
[6] W. Wahyudi, S. Adriko, M. Harits, and D. Hapsari, “Perbandingan Kinerja Algoritma Klasifikasi Naive Bayes, k-Nearest Neighbor dan Logistic Regression pada Dataset Multiclass,” Seminar Nasional Teknik Elektro, Sistem Informasi, dan Teknik Informatika FTETI, vol. 1, no. 1, pp. 380–385, Mar. 2023.
[7] Khusni Mubarok, T. Wibowo, and S. Wibowo, “Kaji Awal Pendeteksi Api Menggunakan Kamera dengan Program Machine Learning,” Prosiding The 13th Industrial Research Workshop and National Seminar, pp. 639–643, Jul. 2022.
[8] B. Wijaya and V. Mawardi, “PENDETEKSI UJARAN KEBENCIAN PADA PLATFORM MEDIA SOSIAL TWITTER MENGGUNAKAN SUPPORT VECTOR MACHINE,” Jurnal Serina Sains, Teknik dan Kedokteran, vol. 1, no. 1, pp. 11–17, Feb. 2023.
[9] M. I. Abidin, K. A. Notodiputro, and B. Sartono, “Improving Classification Model Performances using an Active Learning Method to Detect Hate Speech in Twitter,” Indonesian Journal of Statistics and Its Applications, vol. 5, no. 1, pp. 26–38, Mar. 2021, doi: https://doi.org/10.29244/ijsa.v5i1p26-38.
[10] Y. S. Mahardhika and E. Zuliarso, “ANALISIS SENTIMEN TERHADAP PEMERINTAHAN JOKO WIDODO PADA MEDIA SOSIAL TWITTER MENGGUNAKAN ALGORITMA NAIVES BAYES CLASSIFIER,” SINTAK, vol. 2, Nov. 2018, Accessed: Nov. 06, 2023. [Online]. Available: https://www.unisbank.ac.id/ojs/index.php/sintak/article/view/6651
[11] Renaldo Yosia Rafael and Fransiskus Adikara, “PENGIMPLMENTASIAN ALGORITMA LONG SHORT-TERM MEMORY UNTUK MENDETEKSI UJARAN KEBENCIAN PADA APLIKASI TWITTER,” JIPI (Jurnal Ilmiah Penelitian dan Pembelajaran Informatika), vol. 8, no. 2, pp. 551–560, May 2023, doi: https://doi.org/10.29100/jipi.v8i2.3490.
[12] K. N. Widyatnyana, I. W. Rasna, and I. B. Putrayasa, “UJARAN KEBENCIAN DI DALAM TWITTER #SEBELUM2024JOKOWILENGSER: KAJIAN CYBERPRAGMATICS,” repo.undiksha.ac.id, Aug. 01, 2023. https://repo.undiksha.ac.id/15785/ (accessed Dec. 05, 2023).
[13] Metatags generator, “Klasifikasi Ujaran Kebencian pada Media Sosial Twitter Menggunakan Support Vector Machine | Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi),” www.jurnal.iaii.or.id, vol. 5, no. 1, Mar. 2021, Accessed: Dec. 05, 2023. [Online]. Available: https://www.jurnal.iaii.or.id/index.php/RESTI/article/view/2700
[14] Muhammad Mishbahul Munir, Mochamad Ali Fauzi, and Rizal Setya Perdana, “Implementasi Metode Backpropagation Neural Network Berbasis Lexicon Based Features dan Bag Of Words untuk Identifikasi Ujaran Kebencian pada Twitter,” Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 2, no. 10, pp. 3182–3191, 2018, Accessed: Dec. 05, 2023. [Online]. Available: https://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/2573
[15] Muhammad Hakiem, Mochammad Ali Fauzi, and Indriati Indriati, “Klasifikasi Ujaran Kebencian pada Twitter Menggunakan Metode Naive Bayes Berbasis N-Gram Dengan Seleksi Fitur Information Gain,” Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 3, no. 3, pp. 2443–2451, 2019, Available: https://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/4682
[16] M. Ridwan and A. Muzakir, “Model Klasifikasi Ujaran Kebencian pada Data Twitter dengan Menggunakan CNN-LSTM”:, Teknomatika, vol. 12, no. 02, pp. 209–218, Sep. 2022, Accessed: Dec. 05, 2023. [Online]. Available: https://ojs.palcomtech.ac.id/index.php/teknomatika/article/view/604
[17] M. Munir, M. Fauzi, and R. Perdana, “Implementasi Metode Backpropagation Neural Network Berbasis Lexicon Based Features dan Bag Of Words untuk Identifikasi Ujaran Kebencian pada Twitter,” Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 2, no. 10, pp. 3182–3191, Oct. 2018.
[18] A. P. J. Dwitama and S. Hidayat, “Identifikasi Ujaran Kebencian Multilabel Pada Teks Twitter Berbahasa Indonesia Menggunakan Convolution Neural Network,” Jurnal Sistem Komputer dan Informatika (JSON), vol. 3, no. 2, p. 117, Dec. 2021, doi: https://doi.org/10.30865/json.v3i2.3610.
[19] A. P. J. Dwitama, “DETEKSI UJARAN KEBENCIAN PADA TWITTER BAHASA INDONESIA MENGGUNAKAN MACHINE LEARNING: REVIU LITERATUR,” Jurnal Sains, Nalar, dan Aplikasi Teknologi Informasi, vol. 1, no. 1, Aug. 2021, doi: https://doi.org/10.20885/snati.v1i1.5.