CLUSTERING K-MEANS UNTUK SISTEM TANYA JAWAB BAHASA INDONESIA BIDANG KESEHATAN

Steven Muliadi, Viny Christanti

Abstract


Question and Answering (QA) system is a system to answer question based on collections of unstructured text documents in the form of natural language or human language. In general, QA system consists of four stages, i.e. question analysis, document selection, passage retrieval, and answer extraction. In this study, we added two processes, i.e. documents clustering and passage clustering. Clustering K-Means is used for this study. Naive Bayes Classification is used for document or passage selection. Passage building is done with Dynamic Passage Partitioning. Document selection is done with Lucene. The experiments was done using 100 questions from 1000 Indonesian Health Documents. Test results show that system without clustering has the best accuracy 63 %. System produces the best result with the use of 5 of the most relevant documents, 5 passage with the highest score, and 10 answer with the closest distance.

Key words Clustering K-Means, Dynamic Passage Partitioning, Health, Information Retrieval, Naive Bayes Classification, Question Answering


Refbacks

  • There are currently no refbacks.