Computing Reviews, the leading online review service for computing literature.

Search

A novel parallelization approach for hierarchical clustering
Du Z., Lin F. Parallel Computing31 (5):523-527,2005.Type:Article

Date Reviewed: Jan 24 2006

Hierarchical clustering is an approach to collecting groups of similar data points. The method begins with individual data points (clusters), which are then successively merged according to some predefined metric until one single hierarchically constructed cluster is obtained. Typical implementations of the hierarchical clustering method are memory and computation-time intensive. Parallel implementations of such algorithms are therefore desirable. The novel aspect of this paper is its parallel implementation of the hierarchical clustering method on a distributed memory, multiple instruction stream multiple data stream (MIMD) architecture (Beowulf). Parallel implementations on single instruction multiple data (SIMD) and parameter random access memory (PRAM) architectures have been previously published. In the presented approach, all points (N) are stored on all processors (P); however, each processor works on only N/P points. Although such data decomposition for matrix operations is commonly used in parallel linear algebra (for example, ScaLAPACK), this is perhaps its first documented use for hierarchical clustering. The parallel implementation is tested for approximately 10,000 data points. The speedup is surprisingly linear, albeit only with 31 percent parallel efficiency. The linearity will also probably break down for more processors. A comparison of the parallel efficiency of this method versus other parallel implementations would be very useful. The authors also do not discuss the issues with load balancing as the number of remaining clusters reduces. Given the increasing use of Beowulfs as an inexpensive alternative to supercomputers, the parallel approach presented here has value, despite its low parallel efficiency. A closer look at redistributing clusters among processors for efficient load balancing may lead to some improvement.

Reviewer: Anupam Sharma	Review #: CR132345 (0608-0850)

Parallel Algorithms (G.1.0 ... )

Algorithms (I.5.3 ... )

Would you recommend this review?

yes

Other reviews under "Parallel Algorithms":	Date

Parallel algorithms in computational science Heermann D., Burkitt A., Springer-Verlag New York, Inc., New York, NY, 1991. Type: Book (9780387534183)	Apr 1 1992

A parallel shortest augmenting path algorithm for the assignment problem Balas E., Miller D., Pekny J., Toth P. (ed) Journal of the ACM 38(4): 985-1004, 1991. Type: Article	Sep 1 1992

An o(n log n) minimal spanning tree algorithmn for n points in the plane Changm R., Lee R. BIT 26(1): 7-16, 1986. Type: Article	Nov 1 1987

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy