Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
A novel parallelization approach for hierarchical clustering
Du Z., Lin F. Parallel Computing31 (5):523-527,2005.Type:Article
Date Reviewed: Jan 24 2006

Hierarchical clustering is an approach to collecting groups of similar data points. The method begins with individual data points (clusters), which are then successively merged according to some predefined metric until one single hierarchically constructed cluster is obtained. Typical implementations of the hierarchical clustering method are memory and computation-time intensive. Parallel implementations of such algorithms are therefore desirable. The novel aspect of this paper is its parallel implementation of the hierarchical clustering method on a distributed memory, multiple instruction stream multiple data stream (MIMD) architecture (Beowulf). Parallel implementations on single instruction multiple data (SIMD) and parameter random access memory (PRAM) architectures have been previously published.

In the presented approach, all points (N) are stored on all processors (P); however, each processor works on only N/P points. Although such data decomposition for matrix operations is commonly used in parallel linear algebra (for example, ScaLAPACK), this is perhaps its first documented use for hierarchical clustering. The parallel implementation is tested for approximately 10,000 data points. The speedup is surprisingly linear, albeit only with 31 percent parallel efficiency. The linearity will also probably break down for more processors. A comparison of the parallel efficiency of this method versus other parallel implementations would be very useful. The authors also do not discuss the issues with load balancing as the number of remaining clusters reduces.

Given the increasing use of Beowulfs as an inexpensive alternative to supercomputers, the parallel approach presented here has value, despite its low parallel efficiency. A closer look at redistributing clusters among processors for efficient load balancing may lead to some improvement.

Reviewer:  Anupam Sharma Review #: CR132345 (0608-0850)
Bookmark and Share
  Reviewer Selected
 
 
Parallel Algorithms (G.1.0 ... )
 
 
Algorithms (I.5.3 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Parallel Algorithms": Date
Parallel algorithms in computational science
Heermann D., Burkitt A., Springer-Verlag New York, Inc., New York, NY, 1991. Type: Book (9780387534183)
Apr 1 1992
A parallel shortest augmenting path algorithm for the assignment problem
Balas E., Miller D., Pekny J., Toth P. (ed) Journal of the ACM 38(4): 985-1004, 1991. Type: Article
Sep 1 1992
An o(n log n) minimal spanning tree algorithmn for n points in the plane
Changm R., Lee R. BIT 26(1): 7-16, 1986. Type: Article
Nov 1 1987
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy