Abstract
<jats:p>Semi-supervised Density Peak (SDenPeak) algorithm is known to be efficient and simple in tasks clustering. It improves clustering performance by adding pair-wise constraints, must-link and cannot-link constraints, that drive the grouping process by imposing similarity and dissimilarity between data points. One of the key considerations in clustering accuracy is the selection of similarity measure because various measures reflect diverse structural attributes to data. The problem with the fact that there is no universal best measure of similarity is that it is a tricky task to choose a suitable measure that is dependent on the nature of the data. To explore the effects of the six similarity measures on SDenPeak algorithm performance, the six measures (Euclidean Distance, Cosine Similarity, City Block (Manhattan) Distance, Minkowski Distance, Earth Mover’s Distance (EMD), and Rapid Computation of the Maximal Information Coefficient (RapidMIC) Distance) are evaluated systematically in this study in order to understand their influences Real-world datasets are extensively experimented to evaluate the accuracy of clustering and structural consistency in each of the measures. These findings present comparative information on the effectiveness of the various similarity measures and illustrate their applicability to various data distributions providing a useful guide to achieving the best clustering performance in semi-supervised models.</jats:p>