Computing Reviews

Improved analysis of complete-linkage clustering
Gro&bgr;wendt A., Röglin H. Algorithmica78(4):1131-1150,2017.Type:Article
Date Reviewed: 07/31/18

The authors consider the problem of clustering n points in ℝd into k clusters, where the metric on ℝd has yet to be specified. They consider three possible measures: minimal diameter, minimal radius, and minimal discrete radius (the center has to be one of the cluster points). If we consider a cluster of the three vertices of an equilateral triangle of size 1, this cluster has diameter 1, radius 1/√3, and discrete radius 1. Hence the three measures are different, and we really have three different, albeit related, problems.

The “complete linkage” process starts out with each point in its own cluster, and then repeatedly merges pairs of clusters so as to increase the measure as little as possible: essentially a greedy method. This leads to a hierarchical clustering C1,…,Cn, where each Ck is obtained from the Ck+1 clustering by merging two clusters. As the authors state, “it is easy to find examples where no hierarchical clustering C = (C1,…,Cn) exists such that Ck is an optimal k-clustering for every k.” There are two ways to look at this: either we say that hierarchical clusterings aren’t necessarily the correct route to go down, or we ask for hierarchical clusterings that are nearly optimal, that is, α-approximate if for each k, diam(Ck) is within a factor of α of the diameter of the optimal k-clustering. The authors take this second route.

Dasgupta and Long show that there are 8-approximate hierarchical clusters for the first two problems, for any metric [1]. Conversely, they present an “artificial” metric on ℝ2 such that the complete linkage method is only Ω(log k)-approximate. Ackermann et al. show that, for the first and third problems, there is a lower bound of Ω(p√log k)-approximate for the ℓp metric, but assuming d=Ω(k), so really Ω(p√log d)-approximate [2].

The authors’ main result shows that for fixed d, and any metric on ℝd that is induced by a norm, complete linkage is O(1)-approximate for all three problems. The apparent contradiction with [2] is resolved by the fact that the implied constant in O(1) depends on d. The proof builds heavily on [2], but also has a distinct novelty: the “clustering intersection graph” (actually a hypergraph).


1)

Dasgupta, S.; Long, P. M. Performance guarantees for hierarchical clustering. Journal of Computer and System Sciences 70, 4(2005), 555–569.


2)

Ackermann, M. R.; Blömer, J.; Kuntze, D.; Sohler, C. Analysis of agglomerative clustering. Algorithmica 69, 1(2014), 184–215.

Reviewer:  J. H. Davenport Review #: CR146179 (1811-0601)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy