Benchmarking in cluster analysis: a study on Spectral Clustering, DBSCAN, and K-means
This repository includes the code, data sets, and appendix which were used in our paper:
Nivedha Murugesan, Irene Cho, and Cristina Tortora (2020) Benchmarking in cluster analysis: a study on Spectral Clustering, DBSCAN, and K-means. In Studies in Classification, Data Analysis, and Knowledge Organization, accepted.
Abstract: We perform a benchmark study to identify the advantages and the drawbacks of Spectral Clustering and Density-Based Spatial Clustering of Applications with Noise (DBSCAN). We compare the two methods with the classic K-means clustering. The methods are performed on five simulated and three real data sets. The obtained clustering results are compared using external and internal indices, as well as run times. Although there is not one method that performs best on all data sets, we find that DBSCAN should generally be reserved for non-convex data with well-separated clusters or for data with many outliers. Spectral Clustering has better overall performance but with higher instability of the results compared to K-means, and longer run time.