KLASIFIKASI SENTIMEN ULASAN FILM IMDB MENGGUNAKAN METODE N-GRAM DAN LOGISTIC REGRESSION
Main Article Content
Abstract
Analisis sentimen merupakan salah satu penerapan Natural Language Processing (NLP) yang digunakan untuk mengidentifikasi opini atau emosi seseorang terhadap suatu objek, produk, atau layanan. Dalam penelitian ini, dilakukan analisis sentimen terhadap ulasan film pada dataset IMDb dengan tujuan untuk mengklasifikasikan ulasan menjadi sentimen positif atau negatif. Metode yang digunakan adalah N-Gram sebagai teknik ekstraksi fitur teks dan Logistic Regression sebagai algoritma klasifikasi. Proses diawali dengan pra-pemrosesan teks yang mencakup case folding, penghapusan stopwords, dan lemmatization menggunakan WordNetLemmatizer. Selanjutnya, data direpresentasikan menggunakan TF-IDF (Term Frequency–Inverse Document Frequency) dengan kombinasi N-Gram (1–3) untuk menangkap konteks kata berurutan. Hasil pengujian menunjukkan bahwa model yang dihasilkan memiliki tingkat akurasi sebesar 83%, dengan performa yang baik dalam mendeteksi sentimen positif maupun negatif. Meskipun demikian, model masih memiliki keterbatasan dalam memahami konteks kalimat negasi seperti “not bad” atau “no good”.
Downloads
Article Details
Section

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
This work is licensed under a Jurnal Komunikasi Creative Commons Attribution-ShareAlike 4.0 International License.
How to Cite
References
[1] K. Anhsori and G. F. Shidik, “A Comparative Analysis of Eight Machine Learning Models for Climate Change Sentiment Analysis,” JST (Jurnal Sains dan Teknologi), vol. 14, no. 2, pp. 229–243, 2025, doi: 10.23887/jst-undiksha.v14i2.92672.
[2] M. A. Faridi, F. Tuzzahra, A. Al-Qadri, R. Nahavira, P. A. Az-Zahrah, C. Apriliani, and Abdiansah, “Sentiment Analysis of Movie Reviews on IMDb Using Logistic Regression Algorithm,” Jurnal Ilmiah Teknologi Sistem Informasi (JITSI), vol. 6, no. 2, Jun. 2025, doi: 10.62527/jitsi.6.2.422.
[3] J. Sester, D. Hayes, M. Scanlon, and N.-A. Le-Khac, “A Comparative Study of Support Vector Machine and Neural Networks for File Type Identification Using N-Gram Analysis,” in Proceedings of the International Conference on Digital Forensics and Cyber Crime, 2023, doi: 10.1016/j.fsidi.2021.301121
[4] M. Oelgoetz and T. Walker, “Improving an NSF ACCESS Program AI Chatbot: Response Data Logistic Regression,” in PEARC '24: Practice and Experience in Advanced Research Computing, pp. 101–103, Jul. 2024, doi: 10.1145/3626203.3670628.
[5] Z. Zhan, “Comparative Analysis of TF-IDF and Word2Vec in Sentiment Analysis: A Case of Food Reviews,” Machine Learning in Healthcare and Finance, ITM Web of Conferences, vol. 70, no. 02013, pp. 1–6, 2024, doi: 10.1051/itmconf/20257002013.
[6] E. Shehab, H. Nayel, and M. Taha, “Character N-Gram Model for Toxicity Prediction,” IAES International Journal of Artificial Intelligence (IJ-AI), vol. 13, no. 4, pp. 4380–4387, Dec. 2024, doi: 10.11591/ijai.v13.i4.pp4380-4387.
[7] T. V. Cherian, G. J. L. P. Paulraj, J. B. Princess, and I. J. Jebadurai, “A Comparative Analysis of Machine Learning and Deep Learning Techniques for Aspect-Based Sentiment Analysis,” in Computational Intelligence Methods for Sentiment Analysis in Natural Language Processing Applications, 2024, pp. 23–37, doi: 10.1016/B978-0-443-22009-8.00006-9.
[8] J. P. Venugopal, A. A. V. Subramanian, G. Sundaram, M. Rivera, and P. Wheeler, “A Comprehensive Approach to Bias Mitigation for Sentiment Analysis of Social Media Data,” Applied Sciences, vol. 14, no. 23, p. 11471, Dec. 2024, doi: 10.3390/app142311471.
[9] V. Jain, P. Choudhary, S. Arora, T. Mangal, A. Choudhary, and H. Kumar, “A Comprehensive Exploration of Stack Ensembling Techniques for Amazon Product Review Sentiment Analysis,” in Proc. 2024 11th Int. Conf. on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, Mar. 2024, doi: 10.1109/ICRITO61523.2024.10522384.
[10] N. Hussain, A. Qasim, G. Mehak, O. Kolesnikova, A. Gelbukh, and G. Sidorov, “ORUD-Detect: A Comprehensive Approach to Offensive Language Detection in Roman Urdu Using Hybrid Machine Learning–Deep Learning Models with Embedding