Back to Search View Original Cite This Article

Abstract

<jats:p>Semi-supervised Density Peak (SDenPeak) algorithm is known to be efficient and simple in tasks clustering. It improves clustering performance by adding pair-wise constraints, must-link and cannot-link constraints, that drive the grouping process by imposing similarity and dissimilarity between data points. One of the key considerations in clustering accuracy is the selection of similarity measure because various measures reflect diverse structural attributes to data. The problem with the fact that there is no universal best measure of similarity is that it is a tricky task to choose a suitable measure that is dependent on the nature of the data. To explore the effects of the six similarity measures on SDenPeak algorithm performance, the six measures (Euclidean Distance, Cosine Similarity, City Block (Manhattan) Distance, Minkowski Distance, Earth Mover’s Distance (EMD), and Rapid Computation of the Maximal Information Coefficient (RapidMIC) Distance) are evaluated systematically in this study in order to understand their influences Real-world datasets are extensively experimented to evaluate the accuracy of clustering and structural consistency in each of the measures. These findings present comparative information on the effectiveness of the various similarity measures and illustrate their applicability to various data distributions providing a useful guide to achieving the best clustering performance in semi-supervised models.</jats:p>

Show More

Keywords

similarity clustering measures distance data

Related Articles

PORE

About

Connect