KLASIFIKASI TOKSISITAS KOMENTAR DENGAN ALGORITMA NAIVE BAYES DAN DECISION TREE

David Ciang

PDF

Published: Apr 20, 2023

Keywords:

Naïve Bayes Algorithm, Decision Tree Algorithm, Accuracy, TF-IDF, Toxicity

Dimensions

Altmetrics

Statistics

Read Counter : 6

Download : 8

Crossmark/ Data Version

David Ciang

Abstract

This study aims to develop a toxicity comment classification model using Naive Bayes and Decision Tree algorithms, specifically in the context of the online environment. The dataset consists of online comments, involving preprocessing steps such as text cleaning, normalization, and feature extraction using methods like TF-IDF. The Naive Bayes and Decision Tree classification models are trained on this dataset, and their performance is evaluated using standard metrics such as accuracy, precision, recall, and F1-score. Additionally, a comparative analysis between Naive Bayes and Decision Tree is conducted, focusing on the online context. This analysis aims to provide insights into their effectiveness in identifying toxicity in online comments. The findings of this study serve as a foundation for developing content moderation solutions that can adapt to the dynamic nature of human interactions in the online world. The results of this research have significant implications for building more efficient and effective content moderation systems in the online environment. By concentrating on the online context, the study makes a valuable contribution to understanding the performance of classification algorithms in addressing toxicity in online interactions. Consequently, the study's findings can help enhance user safety and comfort in the online environment through the development of more sophisticated content moderation solutions.

Issue

Vol. 18 No. 1 (2023): Jurnal Komputer dan Informatika

Section

Articles

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Menawarkan akses terbuka

References

[1] D. Jurafsky dan J. H. Martin, Speech and Language Processing, 2022.

[2]. C. D. Manning, P. Raghavan dan H. Schütze, Introduction to Information Retrieval, Cambridge University Press, 2022.

[3]. J. Chen, L. Song, W. Li, Y. Zhang dan X. Cheng, “Exploring Sentiment in Social Media: A Comprehensive Survey.,” Knowledge-Based Systems, vol. 198, p. 105947, 2023.

[4] B. Pang dan L. Lee, “Opinion Mining and Sentiment Analysis: Foundations and Trends,” Foundations and Trends in Information Retrieval, vol. 2, no. 1-2, pp. 1-135, 2019.

[5]. F. Sebastina, “Machine Learning in Automated Text Categorization,” ACM Computing Surveys (CSUR), vol. 34, no. 1, pp. 1-47, 2017.

[6]. A. Srivasta dan V. Singh, “A Comprehensive Review on Text Mining using Novel Methods,” Procedia Computer Science, vol. 165, pp. 197-204, 2023

[7]. S. Tan, X. Cheng dan Y. Wang, “Feature Engineering and Selection for Text Classification: A Review,” Data and Knowledge Engineering, vol. 100, pp. 13-21, 2021.

[8]. H. Witten, E. Frank dan M. A. Hall, Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, 2021.

[9]. S. Kim, “Mining Twitter Data with Python (Part 1: Collecting Data),” 2018.

[10]. J. Saldaña, The Coding Manual for Qualitative Researchers, SAGE Publications, 2017.

[11]. I. Rish, “An Empirical Study of the Naive Bayes Classifier,” dalam IJCAI 2011 Workshop on Empirical Methods in Artificial Intelligence, 2011.

[12]. J. S. R. Pennington dan C. D. Manning, “GloVe: Global Vectors for Word Representation,” dalam Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016.

[13]. S. Hochreiter dan J. Schmidhuber, “Long Short-Term Memory,” Neural Computation, vol. 9, no. 8, pp. 1735-1780, 2017.

[14]. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado dan J. Dean, “Distributed Representations of Words and Phrases and Their Compositionality,” dalam Advances in Neural Information Processing Systems, 2018.

[15]. P. Shrestha, A. Mahmood dan E. Yafi, “A Comprehensive Survey of Machine Learning Techniques in Sentiment Analysis,” Information Processing & Management, vol. 56, no. 5, pp. 1794-1818, 2023.

[16]. Y. Yang dan J. O. Pedersen, “A Comparative Study on Feature Selection in Text Categorization,” dalam Proceedings of the Fourteenth International Conference on Machine Learning, 2017.

[17]. S. R. Makhija dan P. Srinivasan, “Text Classification Using Deep Learning Models: A Comprehensive Review,” Journal of King Saud University - Computer and Information Sciences, 2022.

[18]. Y. Zhang dan B. Wallace, “A Survey of Emerging Trends in Sentiment Analysis in Social Media,” Journal of Artificial Intelligence Research, vol. 71, pp. 933-993, 2021

[19]. C. E. dan W. B., “Jumping NLP Curves: A Review of Natural Language Processing Research,” IEEE Computational Intelligence Magazine, vol. 9, no. 2, pp. 48-57, 2019.

[20]. L. Breiman, “Random Forests,” Machine Learning, vol. 45, no. 1, pp. 5-32, 2016.

Article Sidebar

Main Article Content

Abstract

Article Details

References