Pengenalan Citra Bahasa Isyarat Berdasarkan Sistem Isyarat Bahasa Indonesia Menggunakan Metode Vision Transformer

Main Article Content

Agus Budi Dharmawan
Renaldy

Abstract

Sign language is a form of communication between deaf people. In Indonesia, the formal sign language is Sistem Isyarat Bahasa Indonesia or SIBI for short which is a formal sign language based on American Sign Language. However, automatic sign language recognition still faces various challenges including the complexity of hand gestures, individual variations in sign performance, and the need for real-time interpretation. These challenges make the accuracy and efficiency of sign recognition very important. To address these issues, the Vision Transformer (ViT) method can be implemented, given its advantage in capturing important features of images and its ability in processing complex computer vision tasks. ViT or Vision Transformer is an artificial neural network architecture designed for image processing or computer vision tasks. From the training results with the Vision Transformer model, the training accuracy is 100% and the validation accuracy is 92.30%.

Article Details

Section
Articles

References

M. Sholawati, K. Auliasari, dan FX. Ariwibisono, “PENGEMBANGAN APLIKASI PENGENALAN BAHASA ISYARAT ABJAD SIBI MENGGUNAKAN METODE CONVOLUTIONAL NEURAL NETWORK (CNN),” JATI (Jurnal Mahasiswa Teknik Informatika), vol. 6, no. 1, pp. 134–144, Mar. 2022, doi: https://doi.org/10.36040/jati.v6i1.4507.

M. Bagus, S. Bakti, dan Y. M. Pranoto, “Pengenalan Angka Sistem Isyarat Bahasa Indonesia Dengan Menggunakan Metode Convolutional Neural Network,” Prosiding SEMNAS INOTEK (Seminar Nasional Inovasi Teknologi), vol. 3, no. 1, pp. 011–016, 2019, doi: https://doi.org/10.29407/inotek.v3i1.504.

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, dan N. Houlsby, “AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE,” Jun. 2021. Available: https://arxiv.org/pdf/2010.11929.pdf

Y. Liu, Y. Zhang, Y. Wang, F. Hou, J. Yuan, J. Tian, Y. Zhang, Z. Shi, J. Fan, dan Z. He, “A Survey of Visual Transformers,” Dec. 2022. Accessed: Jun. 03, 2023. [Online]. Available: https://arxiv.org/pdf/2111.06091.pdf

D. R. Kothadiya, C. M. Bhatt, T. Saba, A. Rehman, dan S. A. Bahaj, “SIGNFORMER: DeepVision Transformer for Sign Language Recognition,” IEEE Access, vol. 11, pp. 4730–4739, 2023, doi: https://doi.org/10.1109/access.2022.3231130.

M. Marais, D. Brown, J. Connan, dan A. Boby, “Spatiotemporal Convolutions and Video Vision Transformers for Signer-Independent Sign Language Recognition,” 2023 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD), Aug. 2023, doi: https://doi.org/10.1109/icabcd59051.2023.10220534.

C. K. Tan, K. M. Lim, R. K. Y. Chang, C. P. Lee, dan A. Alqahtani, “HGR-ViT: Hand Gesture Recognition with Vision Transformer,” Sensors, vol. 23, no. 12, p. 5555, Jan. 2023, doi: https://doi.org/10.3390/s23125555.

“Kamus SIBI,” pmpk.kemdikbud.go.id. https://pmpk.kemdikbud.go.id/sibi/profil

O. I. Abiodun, A. Jantan, A. E. Omolara, K. V. Dada, N. A. Mohamed, dan H. Arshad, “State-of-the-art in artificial neural network applications: A survey,” Heliyon, vol. 4, no. 11, p. e00938, Nov. 2018, doi: https://doi.org/10.1016/j.heliyon.2018.e00938.

Anurag Arnab, M. Dehghani, Georg Heigold, C. Sun, M. Lucic, dan C. Schmid, “ViViT: A Video Vision Transformer,” Mar. 2021, doi: https://doi.org/10.48550/arxiv.2103.15691.

S. Mekruksavanich dan A. Jitpattanakul, “LSTM Networks Using Smartphone Data for Sensor-Based Human Activity Recognition in Smart Homes,” Sensors, vol. 21, no. 5, p. 1636, Feb. 2021, doi: https://doi.org/10.3390/s21051636.