KLASIFIKASI KATEGORI WEBSITE MENGGUNAKAN ALGORITMA NAÏVE BAYES DAN SVM
Main Article Content
Abstract
With the explosion of information in the 21st century, information overload has become a major challenge on the internet. Although search engines help users assess the value of websites based on their topics, finding specific information remains challenging. This study proposes a practical approach by creating automatic data records that summarize the content of each website for categorization purposes. Text classification (TC) tasks automatically assign documents to specific categories. This study also focuses on text classification methods, including Naïve Bayes and Support Vector Machine (SVM), to achieve good accuracy. In addition to these two methods, this study details the use of other methods such as K-Nearest Neighbors (KNN) and Random Forest in the context of web phishing classification and sentiment analysis. In this experiment, Naïve Bayes and SVM were evaluated for classifying website categories. The data was divided into training (70%) and testing (30%) with accuracy results of around 88% for Naïve Bayes and 85% for SVM. This study provides a deeper understanding of the performance of two different classification methods in the context of website categorization.
Article Details
Section

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Menawarkan akses terbukaReferences
[1] O. W. Kwon and J. H. Lee, “Text categorization based on k-nearest neighbor approach for Web site classification,” Inf Process Manag, vol. 39, no. 1, pp. 25–44, Jan. 2003, doi: 10.1016/S0306-4573(02)00022-5.
[2] R. Bruni and G. Bianchi, “Robustness analysis of a Website categorization procedure based on Machine Learning,” Dep. of Computer Control and Management Engineering, Sapienza University of Rome, Italy, pp. 1–25, 2018, [Online]. Available: http://www.dis.uniroma1.it/%7B~%7Dbruni/files/bruni18robustness.pdf
[3] R. Bruni and G. Bianchi, “Website categorization: A formal approach and robustness analysis in the case of e-commerce detection,” Expert Syst Appl, vol. 142, Mar. 2020, doi: 10.1016/j.eswa.2019.113001.
[4] A. S. Neogi, K. A. Garg, R. K. Mishra, and Y. K. Dwivedi, “Sentiment analysis and classification of Indian farmers’ protest using twitter data,” International Journal of Information Management Data Insights, vol. 1, no. 2, Nov. 2021, doi: 10.1016/j.jjimei.2021.100019.
[5] Y. HaCohen-Kerner, D. Miller, and Y. Yigal, “The influence of preprocessing on text classification using a bag-of-words representation,” PLoS One, vol. 15, no. 5, 2020, doi: 10.1371/journal.pone.0232525.
[6] L. Chen, L. Jiang, and C. Li, “Modified DFS-based term weighting scheme for text classification,” Expert Syst Appl, vol. 168, 2021, doi: 10.1016/j.eswa.2020.114438.
[7] S. Gan, S. Shao, L. Chen, L. Yu, and L. Jiang, “Adapting hidden naive bayes for text classification,” Mathematics, vol. 9, no. 19, Oct. 2021, doi: 10.3390/math9192378.
[8] H. Zhang, L. Jiang, and L. Yu, “Attribute and instance weighted naive Bayes,” Pattern Recognit, vol. 111, 2021, doi: 10.1016/j.patcog.2020.107674.
[9] H. Kim, P. Rowland, and H. Park, “Dimension reduction in text classification with support vector machines,” Journal of Machine Learning Research, vol. 6, 2005.
[10] A. Wibowo Haryanto, E. Kholid Mawardi, and Muljono, “Influence of Word Normalization and Chi-Squared Feature Selection on Support Vector Machine (SVM) Text Classification,” in Proceedings - 2018 International Seminar on Application for Technology of Information and Communication: Creative Technology for Human Life, iSemantic 2018, Institute of Electrical and Electronics Engineers Inc., Nov. 2018, pp. 229–233. doi: 10.1109/ISEMANTIC.2018.8549748.
[11] A. Dhar, H. Mukherjee, N. S. Dash, and K. Roy, “Text categorization: past and present,” Artif Intell Rev, vol. 54, no. 4, 2021, doi: 10.1007/s10462-020-09919-1.
[12] B. P. Yadav, S. Ghate, A. Harshavardhan, G. Jhansi, K. S. Kumar, and E. Sudarshan, “Text categorization Performance examination Using Machine Learning Algorithms,” in IOP Conference Series: Materials Science and Engineering, 2020. doi: 10.1088/1757-899X/981/2/022044.
[13] R. Alshammari, “Arabic Text categorization using machine learning approaches,” International Journal of Advanced Computer Science and Applications, vol. 9, no. 3, 2018, doi: 10.14569/IJACSA.2018.090332.
[14] S. M. H. Mahmud, M. A. Hossin, H. Jahan, S. R. H. Noori, and T. Bhuiyan, “CSV-ANNOTATE: Generate annotated tables from CSV file,” in 2018 International Conference on Artificial Intelligence and Big Data, ICAIBD 2018, 2018. doi: 10.1109/ICAIBD.2018.8396169.
[15] J. Koo, S. Baek, and S. Kim, “The effect of personal value on CSV (Creating Shared Value),” Journal of Open Innovation: Technology, Market, and Complexity, vol. 5, no. 2, 2019, doi: 10.3390/JOITMC5020034.
[16] A. Paullada, I. D. Raji, E. M. Bender, E. Denton, and A. Hanna, “Data and its (dis)contents: A survey of dataset development and use in machine learning research,” Patterns, vol. 2, no. 11. Cell Press, Nov. 12, 2021. doi: 10.1016/j.patter.2021.100336.