Abstractive Text Summarization Berita Bahasa Indonesia Menggunakan Retrieval-Augmented Generation

Main Article Content

Antonius Sakti Wiradinata
Viny Christanti Mawardi

Abstract

This research discusses the application of Abstractive Text Summarization (ATS) to Indonesian language news using the Retrieval-Augmented Generation (RAG) method. Increased access to news through various digital platforms often causes users to have difficulty identifying relevant information among the large amount of news available. RAG integrates retrieval and generation techniques to produce coherent and informative news summaries. In this research, news from the CNN and CNBC sites was collected via web scraping to form a dataset. The data is processed through several stages, including preprocessing, embedding, information retrieval, and summary generation. Summary quality evaluation was carried out using the ROUGE metric, where the test results show that this system has good performance in the precision aspect, with a ROUGE-1 Precision value of 0.7432 and ROUGE-2 Precision of 0.6174. However, a lower ROUGE Recall value indicates that there is important information that is not fully included in the summary. These results indicate that the RAG method in ATS is effective in helping users obtain core information concisely, but there needs to be improvement in capturing the entire news context

Article Details

Section
Articles

References

[1] [1] M. Indriyani, “Efektivitas Penggunaan Media Online Tirto.Id terhadap Pemenuhan Kebutuhan Informasi Berita Livi Zheng,” Jurnal Studi Jurnalistik, vol. 2, no. 2, pp. 157–167, Dec. 2020, doi: 10.15408/jsj.v2i2.15065.

[2] [2] U. Rani and K. Bidhan, “Comparative Assessment of Extractive Summarization: TextRank, TF-IDF and LDA,” Journal of scientific research, vol. 65, no. 01, pp. 304–311, 2021, doi: 10.37398/jsr.2021.650140.

[3] [3] I. O. William and M. Altamimi, “Text Embedding Implementation Using Retrieval Augmented Generation (RAG) Model Combined With Large Language Model,” 2024.

[4] [4] D. Fitrianah and R. N. Jauhari, “Extractive text summarization for scientific journal articles using long short-term memory and gated recurrent units,” Bulletin of Electrical Engineering and Informatics, vol. 11, no. 1, pp. 150–157, Feb. 2022, doi: 10.11591/eei.v11i1.3278.

[5] [5] G. Keswani, W. Bisen, H. Padwad, Y. Wankhedkar, S. Pandey, and A. Soni, “Abstractive Long Text Summarization using Large Language Models.” [Online]. Available: www.ijisae.org

[6] [6] F. Amalia Rahmadianti and N. Hendrastuty, “THE INFLUENCE OF FEATURE EXTRACTION ON AUTOMATIC TEXT SUMMARIZATION USING GENETIC ALGORITHM,” vol. 5, no. 4, pp. 79–84, 2024, doi: 10.52436/1.jutif.2024.5.4.2064.

[7] [7] I. Muslim et al., “Implementasi Text Summarization Pada Review Aplikasi Digital Library System Menggunakan Metode Maximum Marginal Relevance,” 2024.

[8] [8] Y. A. Hafiz and E. Sudarmilah, “IMPLEMENTASI WEB SCRAPING PADA PORTAL BERITA ONLINE,” 2023.

[9] [9] F. Koto, J. H. Lau, and T. Baldwin, “Liputan6: A Large-scale Indonesian Dataset for Text Summarization,” Nov. 2020, [Online]. Available: http://arxiv.org/abs/2011.00679

[10] [10] Y. E. Işıkdemir, “NLP TRANSFORMERS: ANALYSIS OF LLMS AND TRADITIONAL APPROACHES FOR ENHANCED TEXT SUMMARIZATION,” Eskişehir Osmangazi Üniversitesi Mühendislik ve Mimarlık Fakültesi Dergisi, vol. 32, no. 1, pp. 1140–1151, Apr. 2024, doi: 10.31796/ogummf.1303569.

[11] [11] S. Liu, J. Wu, J. Bao, W. Wang, N. Hovakimyan, and C. G. Healey, “Towards a Robust Retrieval-Based Summarization System,” Mar. 2024, [Online]. Available: http://arxiv.org/abs/2403.19889

[12] [12] P. Lewis et al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” May 2020, [Online]. Available: http://arxiv.org/abs/2005.11401

[13] [13] Zhang, J., Zhao, Y., Saleh, M., & Liu, P. (2020, November). Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In International conference on machine learning (pp. 11328-11339). PMLR.

[14] [14] M. F. Mridha, A. A. Lima, K. Nur, S. C. Das, M. Hasan, and M. M. Kabir, “A Survey of Automatic Text Summarization: Progress, Process and Challenges,” IEEE Access, vol. 9, pp. 156043–156070, 2021, doi: 10.1109/ACCESS.2021.3129786.

[15] [15] A. Pradhan and K. Kumar Todi, “Understanding Large Language Model Based Metrics for Text Summarization,” 2023.

[16] [16] E. Malinen, “INTERACTIVE DOCUMENT SUMMARIZER USING LLM TECHNOLOGY,” 2024.

[17] S. Gaddam BTech Scholar, “ADVANCED SEARCH AND SUMMARIZATION OF EDUCATIONAL DOCUMENTS USING MACHINE LEARNING,” Journal of Nonlinear Analysis and Optimization, vol. 15, no. 6, p. 2024, 2024.