OPTICAL CHARACTER RECOGNITION MENGGUNAKAN UIPATH DAN PENCOCOKAN DATA SERTIFIKAT DENGAN ALGORITMA LEVENSHTEIN DISTANCE

Main Article Content

Cynthia Natalie
Viny Christanti Mawardi
Manatap Dolok Lauro Sitorus

Abstract

The collection of the certificates was one of the requirements for graduating from Tarumanagara University that the certificate became an important point of attention to improving the competency of Tarumanagara University students. Certificates can be collected from seminars, workshops, courses, and so forth. The Certificate Information Extraction design were created using Uipath Studio application that uses vb.Net programming language and Levenshtein Distance Algorithm. The design aims to assist the study program in validation on student certificates file by obtaining better accuracy using the Levenshtein Distance Algorithm.. The design uses input data as student certificates that’s next to be processed by text preprocessing consisting of text deductions (parsing), case folding, lexical analysis (tokenizing), and text removal (stopword removal). After processing, the Levenshtein Distance Algorithm will be used to calculate the minimum distance between one text and the other with a two-dimensional matrix operation, thus determining the validity of student certificates. The results of this design represent that using the Levenshtein Distance Algorithm, obtaining the best word accuracy result of 83.52% and RPA running time of 94.7 ms.

Article Details

Section
Artikel

References

Chowdhary, K. (2020). Fundamentals of artificial intelligence. Springer Nature India Private Limited 2020. New Delhi, India.

Daniati, Y. N.; Nurfitri, K and Zulkarnain, I. A. (2022). Penerapan Algoritma Levenshtein Distance pada Sistem Pencarian Data Buku Berbasis Web. KOMPUTEK, Vol. 6(1).

Das, A. K.; Hossain, M. M.; Labib, M. F.; Mukta, M and Rifat, A. S. “Auto-correction of english to bengali transliteration system using levenshtein distance”, (pp. 1-5). 2019 7th International Conference on Smart Computing & Communications (ICSCC), Malaysia, 28-30 June 2019,

Dershowitz, N and Kissos, I. (2016). “OCR error correction using character correction and feature-based word classification”. 2016 12th IAPR Workshop on Document Analysis Systems (DAS), Greece, 11-14 April 2016, pp. 198-203,

Florentina, M. (2020). Web Data extraction with robot process automation. study on linkedin web scraping using uipath studio. Annals of 'Constantin Brancusi' University of Targu-Jiu. Engineering Series(1).

Gilkar, G. A.; Hakak, S. I.; Imran, M; Kamsin, A.; Khan, W. Z. and Shivakumara, P. (2019). Exact string matching algorithms: Survey, issues, and future research directions. IEEE Access, 7, 69614-69637.

Iswari, Ni Made Satvika; Rusli, Andre and Setiabudi, Reza. (2021). Enhancing text classification performance by preprocessing misspelled words in Indonesian language. TELKOMNIKA (Telecommunication Computing Electronics and Control). Vol. 19(4).

Jain, Aditya; Kulkarni, Gandhar and Shah, Vraj. (2018). Natural Language Processing. International Journal of Computer Sciences and Engineering, Vol. 16(1), pp. 2647-2693.

Rumapea, Humuntal. (2021). Deteksi Kemiripan Artikel Melalui Keywords dengan Metode Fuzzy String Matching dalam Natural Language Processing" METHOMIKA: Jurnal Manajemen Informatika & Komputerisasi Akuntansi. Vol 5(1), pp. 60-66.

Swashthika, A.K. & Diwaan, C. S. (2020). Analyzing and experimenting open source ocr engines in rpa with levenshtein distance algorithm. International Research Journal on Advanced Science Hub. 2(125).