EKSTRAKSI PAPER CITATION UNTUK PENDETEKSIAN SITASI PADA TULISAN KARYA ILMIAH BAHASA INDONESIA

Kevin Kevin, Viny Christanti, Prof. Dr. Ir. Dali S. Naga, MMSI

Abstract


The main focus of this study is to develop system to extract Indonesian paper citations with a good level of accuracy. The system based on ParsCit with feature adjustment and document training. In addition of ParsCit initial features, we add new features to match Indonesian environment along with new training data consist of Indonesian labeled headers and citations. We applied a probabilistic method Conditional Random Field (CRF) for labeling token in scientific paper reference string. CRF learns new characteristics of each entity using the new Indonesian data and build a model based on it. This model can be applied to unseen data and tested on Indonesian scientific papers. Test results shows that CRF can be applied well for Indonesian papers. System accurately labeled Indonesian paper citation with average accuracy of 98% for headers and 94% for citations.

 

Key words

Conditional Random Field, Fakultas Teknologi Informasi Universitas Tarumanagara, Parsing Citation, Karya Ilmiah Bahasa Indonesia, Pengenalan Entitas, ParsCit

Refbacks

  • There are currently no refbacks.