Main Article Content
The main focus of this study is to develop system to aggregate Indonesian online newspaper and cluster it according to its topic automatically. The system use content extraction to get the main content of articles and Hierarchical Agglomerative Clustering to group articles by its topic with Dice Similarity Coefficient for similarity measure. To determine the cutting point, we cut dendrogram where the gap between two successive combination similarities is largest. Additionally, we add threshold to limit cutting area to improve cluster result. We use Standard Boolean Model for searching feature and Silhouette to evaluate cluster results. Test results using 998 articles shows that limiting cutting area with 0.1 and 0.5 can produce highest average silhouette value 0.264.
This work is licensed under a Jurnal Komunikasi Creative Commons Attribution-ShareAlike 4.0 International License.