Abstractive Text Summarization Berita Bahasa Indonesia Menggunakan Retrieval-Augmented Generation

Antonius Sakti Wiradinata; Viny Christanti Mawardi

doi:10.24912/jiksi.v13i1.32861

PDF

Published: Jan 29, 2025

DOI: https://doi.org/10.24912/jiksi.v13i1.32861

Keywords:

Abstractive Text Summarization, embedding, information, Retrieval-Augmented Generation, ROUGE, web scraping

Dimensions

Altmetrics

Statistics

Read Counter : 209

Download : 117

Crossmark/ Data Version

Antonius Sakti Wiradinata

Viny Christanti Mawardi

Abstract

This research discusses the application of Abstractive Text Summarization (ATS) to Indonesian language news using the Retrieval-Augmented Generation (RAG) method. Increased access to news through various digital platforms often causes users to have difficulty identifying relevant information among the large amount of news available. RAG integrates retrieval and generation techniques to produce coherent and informative news summaries. In this research, news from the CNN and CNBC sites was collected via web scraping to form a dataset. The data is processed through several stages, including preprocessing, embedding, information retrieval, and summary generation. Summary quality evaluation was carried out using the ROUGE metric, where the test results show that this system has good performance in the precision aspect, with a ROUGE-1 Precision value of 0.7432 and ROUGE-2 Precision of 0.6174. However, a lower ROUGE Recall value indicates that there is important information that is not fully included in the summary. These results indicate that the RAG method in ATS is effective in helping users obtain core information concisely, but there needs to be improvement in capturing the entire news context

Issue

Vol. 13 No. 1 (2025): Jurnal Ilmu Komputer dan Sistem Informasi

Section

Articles

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

This work is licensed under a Jurnal Komunikasi Creative Commons Attribution-ShareAlike 4.0 International License.

References

[1] [1] M. Indriyani, “Efektivitas Penggunaan Media Online Tirto.Id terhadap Pemenuhan Kebutuhan Informasi Berita Livi Zheng,” Jurnal Studi Jurnalistik, vol. 2, no. 2, pp. 157–167, Dec. 2020, doi: 10.15408/jsj.v2i2.15065.

[2] [2] U. Rani and K. Bidhan, “Comparative Assessment of Extractive Summarization: TextRank, TF-IDF and LDA,” Journal of scientific research, vol. 65, no. 01, pp. 304–311, 2021, doi: 10.37398/jsr.2021.650140.

[3] [3] I. O. William and M. Altamimi, “Text Embedding Implementation Using Retrieval Augmented Generation (RAG) Model Combined With Large Language Model,” 2024.

[4] [4] D. Fitrianah and R. N. Jauhari, “Extractive text summarization for scientific journal articles using long short-term memory and gated recurrent units,” Bulletin of Electrical Engineering and Informatics, vol. 11, no. 1, pp. 150–157, Feb. 2022, doi: 10.11591/eei.v11i1.3278.

[5] [5] G. Keswani, W. Bisen, H. Padwad, Y. Wankhedkar, S. Pandey, and A. Soni, “Abstractive Long Text Summarization using Large Language Models.” [Online]. Available: www.ijisae.org

[6] [6] F. Amalia Rahmadianti and N. Hendrastuty, “THE INFLUENCE OF FEATURE EXTRACTION ON AUTOMATIC TEXT SUMMARIZATION USING GENETIC ALGORITHM,” vol. 5, no. 4, pp. 79–84, 2024, doi: 10.52436/1.jutif.2024.5.4.2064.

[7] [7] I. Muslim et al., “Implementasi Text Summarization Pada Review Aplikasi Digital Library System Menggunakan Metode Maximum Marginal Relevance,” 2024.

[8] [8] Y. A. Hafiz and E. Sudarmilah, “IMPLEMENTASI WEB SCRAPING PADA PORTAL BERITA ONLINE,” 2023.

[9] [9] F. Koto, J. H. Lau, and T. Baldwin, “Liputan6: A Large-scale Indonesian Dataset for Text Summarization,” Nov. 2020, [Online]. Available: http://arxiv.org/abs/2011.00679

[10] [10] Y. E. Işıkdemir, “NLP TRANSFORMERS: ANALYSIS OF LLMS AND TRADITIONAL APPROACHES FOR ENHANCED TEXT SUMMARIZATION,” Eskişehir Osmangazi Üniversitesi Mühendislik ve Mimarlık Fakültesi Dergisi, vol. 32, no. 1, pp. 1140–1151, Apr. 2024, doi: 10.31796/ogummf.1303569.

[11] [11] S. Liu, J. Wu, J. Bao, W. Wang, N. Hovakimyan, and C. G. Healey, “Towards a Robust Retrieval-Based Summarization System,” Mar. 2024, [Online]. Available: http://arxiv.org/abs/2403.19889

[12] [12] P. Lewis et al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” May 2020, [Online]. Available: http://arxiv.org/abs/2005.11401

[13] [13] Zhang, J., Zhao, Y., Saleh, M., & Liu, P. (2020, November). Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In International conference on machine learning (pp. 11328-11339). PMLR.

[14] [14] M. F. Mridha, A. A. Lima, K. Nur, S. C. Das, M. Hasan, and M. M. Kabir, “A Survey of Automatic Text Summarization: Progress, Process and Challenges,” IEEE Access, vol. 9, pp. 156043–156070, 2021, doi: 10.1109/ACCESS.2021.3129786.

[15] [15] A. Pradhan and K. Kumar Todi, “Understanding Large Language Model Based Metrics for Text Summarization,” 2023.

[16] [16] E. Malinen, “INTERACTIVE DOCUMENT SUMMARIZER USING LLM TECHNOLOGY,” 2024.

[17] S. Gaddam BTech Scholar, “ADVANCED SEARCH AND SUMMARIZATION OF EDUCATIONAL DOCUMENTS USING MACHINE LEARNING,” Journal of Nonlinear Analysis and Optimization, vol. 15, no. 6, p. 2024, 2024.

Abstractive Text Summarization Berita Bahasa Indonesia Menggunakan Retrieval-Augmented Generation

Abstract

References

Most read articles by the same author(s)

Similar Articles

Similar Articles

Sistem Informasi Desa Wisata Bobung Sentra Kerajinan Tangan Berbasis Web

PERANCANGAN PEMBUATAN APLIKASI PENJUALAN TIKET BERBASIS DESKTOP PADA FAMILY TOUR & TRAVEL MENGGUNAKAN VISUAL BASIC .NET

Aplikasi Reservasi Tempat dan Pemesanan Menu pada Restoran Central Cabang Kelapa Gading Berbasis Desktop dan Mobile

PERANCANGAN RETRIEVE, CLUSTER, SUMMARIZE (RCS) SYSTEM DENGAN METODE MULTI FEATURES COMBINATION

PERANCANGAN SISTEM INFORMASI SEKOLAH MENENGAH ATAS KRISTEN KASIH KEMULIAAN BERBASIS WEB

PERANCANGAN APLIKASI SISTEM INFORMASI PARIWISATA BERBASIS WEB UNTUK DESA GIRITENGAH, BOROBUDUR

Perancangan Program Aplikasi Appointment pada Labrows Berbasis Website

Simulasi Jaringan untuk Sistem Terdistribusi E-Commerce Joomla dengan GNS3

PERANCANGAN INFORMATION RETRIEVAL SYSTEM UNTUK DOKUMEN BERBAHASA INDONESIA DENGAN MENGGUNAKAN EXTENDED BOOLEAN

PERANCANGAN SISTEM INFORMASI MANAJEMEN EKSTRAKURIKULER BERBASIS WEB PADA SMA X JAKARTA

Article Sidebar

Main Article Content

Abstract

Article Details

References

Most read articles by the same author(s)

Similar Articles