Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Improved analysis of complete-linkage clustering
Gro&bgr;wendt A., Röglin H. Algorithmica78 (4):1131-1150,2017.Type:Article
Date Reviewed: Jul 31 2018

The authors consider the problem of clustering n points in ℝd into k clusters, where the metric on ℝd has yet to be specified. They consider three possible measures: minimal diameter, minimal radius, and minimal discrete radius (the center has to be one of the cluster points). If we consider a cluster of the three vertices of an equilateral triangle of size 1, this cluster has diameter 1, radius 1/√3, and discrete radius 1. Hence the three measures are different, and we really have three different, albeit related, problems.

The “complete linkage” process starts out with each point in its own cluster, and then repeatedly merges pairs of clusters so as to increase the measure as little as possible: essentially a greedy method. This leads to a hierarchical clustering C1,…,Cn, where each Ck is obtained from the Ck+1 clustering by merging two clusters. As the authors state, “it is easy to find examples where no hierarchical clustering C = (C1,…,Cn) exists such that Ck is an optimal k-clustering for every k.” There are two ways to look at this: either we say that hierarchical clusterings aren’t necessarily the correct route to go down, or we ask for hierarchical clusterings that are nearly optimal, that is, α-approximate if for each k, diam(Ck) is within a factor of α of the diameter of the optimal k-clustering. The authors take this second route.

Dasgupta and Long show that there are 8-approximate hierarchical clusters for the first two problems, for any metric [1]. Conversely, they present an “artificial” metric on ℝ2 such that the complete linkage method is only Ω(log k)-approximate. Ackermann et al. show that, for the first and third problems, there is a lower bound of Ω(p√log k)-approximate for the ℓp metric, but assuming d=Ω(k), so really Ω(p√log d)-approximate [2].

The authors’ main result shows that for fixed d, and any metric on ℝd that is induced by a norm, complete linkage is O(1)-approximate for all three problems. The apparent contradiction with [2] is resolved by the fact that the implied constant in O(1) depends on d. The proof builds heavily on [2], but also has a distinct novelty: the “clustering intersection graph” (actually a hypergraph).

Reviewer:  J. H. Davenport Review #: CR146179 (1811-0601)
1) Dasgupta, S.; Long, P. M. Performance guarantees for hierarchical clustering. Journal of Computer and System Sciences 70, 4(2005), 555–569.
2) Ackermann, M. R.; Blömer, J.; Kuntze, D.; Sohler, C. Analysis of agglomerative clustering. Algorithmica 69, 1(2014), 184–215.
Bookmark and Share
  Featured Reviewer  
 
Clustering (I.5.3 )
 
 
Approximation (G.1.2 )
 
Would you recommend this review?
yes
no
Other reviews under "Clustering": Date
On the convergence of “A self-supervised vowel recognition system”
Pathak A., Pal S. Pattern Recognition 20(2): 237-244, 1987. Type: Article
Aug 1 1988
Conceptual clustering of structured objects: a goal-oriented approach
Stepp R., Michalski R. (ed) Artificial Intelligence 28(1): 43-69, 1986. Type: Article
Sep 1 1986
The enhanced LBG algorithm
Patané G., Russo M. Neural Networks 14(9): 1219-1237, 2001. Type: Article
Apr 2 2003
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy