Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Performance guarantees for hierarchical clustering
Dasgupta S., Long P. Journal of Computer and System Sciences70 (4):555-569,2005.Type:Article
Date Reviewed: Nov 30 2005

Under hierarchical clustering, a dataset of n points is recursively partitioned into 2, 3, ... k clusters. Typically, at each recursion step, a cluster is divided into two or more clusters based on a cost function (such as the popular k-means). In this context, two important issues arise in hierarchical clustering: Which is the most appropriate cost function? How can the overall cost of producing k clusters be estimated or approximated?

This paper proposes the maximum radius of clusters as the cost function, and introduces an approximation algorithm for hierarchical clustering. This approximation algorithm favors the choice of the point furthest from a cluster center so as to make it a new cluster center. It then assigns points to clusters by an appropriate numbering scheme and an associated parent function, which preserves the different levels of granularity. The idea proposed here is sufficiently justified by theorems and lemmas that are proven in the paper. Moreover, the paper highlights interesting open questions, and the references used are adequate and closely related to the paper’s content.

This paper is of high interest for researchers working in data mining, since the proposed approximation algorithm refers to a generic dataset of points, and it shows performance guarantees for every k clustering. Scientists with a strong mathematical background will be able to follow the paper. It could also serve as reading material on data mining in graduate courses.

Reviewer:  Athena Vakali Review #: CR132111 (0606-0628)
Bookmark and Share
  Featured Reviewer  
 
Clustering (H.3.3 ... )
 
 
Parameter Learning (I.2.6 ... )
 
 
Performance Evaluation (Efficiency And Effectiveness) (H.3.4 ... )
 
 
Systems And Software (H.3.4 )
 
Would you recommend this review?
yes
no
Other reviews under "Clustering": Date
Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases
Can F. (ed), Ozkarahan E. ACM Transactions on Database Systems 15(3): 483-517, 1990. Type: Article
Dec 1 1992
A parallel algorithm for record clustering
Omiecinski E., Scheuermann P. ACM Transactions on Database Systems 15(3): 599-624, 1990. Type: Article
Nov 1 1992
Organization of clustered files for consecutive retrieval
Deogun J., Raghavan V., Tsou T. ACM Transactions on Database Systems 9(4): 646-671, 1984. Type: Article
Jun 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy