PENGGUNAAN TESAURUS UNTUK MELAKUKAN CLUSTERING SINGLE-LINKAGE PADA MEDIA SOSIAL TWITTER

Sylvia Wulandari

Abstract


Clustering is a one of the text operations to categorize documents using similarity contents to find the relationship between the news or topics. In this study, single-linkage clustering is used to categorize the content of tweets and generate a topic to each cluster. We used Manhattan Distance to calculated the distance between words. In this paper, we also used thesaurus for clusttering process. The data will be joined according to the tweets and distance of the closest synonym. The experiments were performed using 4 sets of data with different threshold values. The accurary of this system is evaluated using the value of purity. It will be used to compare the result between the system result and the references. It turn out, purity using a thesaurus is better than without using thesaurus, because the cluster will be joined when words have synonyms in tweets. The best clustering accurary obtained from first dataset with 0.0003 threshold value is 80.16%.

Key words: Clustering, Hierarchical Clustering, Manhattan Distance, Single-Linkage Clustering, Thesaurus, Tweets


Refbacks

  • There are currently no refbacks.