Computing Reviews, the leading online review service for computing literature.

Search

Performance guarantees for hierarchical clustering
Dasgupta S., Long P. Journal of Computer and System Sciences70 (4):555-569,2005.Type:Article

Date Reviewed: Nov 30 2005

Under hierarchical clustering, a dataset of n points is recursively partitioned into 2, 3, ... k clusters. Typically, at each recursion step, a cluster is divided into two or more clusters based on a cost function (such as the popular k-means). In this context, two important issues arise in hierarchical clustering: Which is the most appropriate cost function? How can the overall cost of producing k clusters be estimated or approximated? This paper proposes the maximum radius of clusters as the cost function, and introduces an approximation algorithm for hierarchical clustering. This approximation algorithm favors the choice of the point furthest from a cluster center so as to make it a new cluster center. It then assigns points to clusters by an appropriate numbering scheme and an associated parent function, which preserves the different levels of granularity. The idea proposed here is sufficiently justified by theorems and lemmas that are proven in the paper. Moreover, the paper highlights interesting open questions, and the references used are adequate and closely related to the paper’s content. This paper is of high interest for researchers working in data mining, since the proposed approximation algorithm refers to a generic dataset of points, and it shows performance guarantees for every k clustering. Scientists with a strong mathematical background will be able to follow the paper. It could also serve as reading material on data mining in graduate courses.

Reviewer: Athena Vakali	Review #: CR132111 (0606-0628)

Clustering (H.3.3 ... )

Parameter Learning (I.2.6 ... )

Performance Evaluation (Efficiency And Effectiveness) (H.3.4 ... )

Systems And Software (H.3.4 )

Would you recommend this review?

yes

Other reviews under "Clustering":	Date

Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases Can F. (ed), Ozkarahan E. ACM Transactions on Database Systems 15(3): 483-517, 1990. Type: Article	Dec 1 1992

A parallel algorithm for record clustering Omiecinski E., Scheuermann P. ACM Transactions on Database Systems 15(3): 599-624, 1990. Type: Article	Nov 1 1992

Organization of clustered files for consecutive retrieval Deogun J., Raghavan V., Tsou T. ACM Transactions on Database Systems 9(4): 646-671, 1984. Type: Article	Jun 1 1985

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy