Hierarchical clustering is an approach to collecting groups of similar data points. The method begins with individual data points (clusters), which are then successively merged according to some predefined metric until one single hierarchically constructed cluster is obtained. Typical implementations of the hierarchical clustering method are memory and computation-time intensive. Parallel implementations of such algorithms are therefore desirable. The novel aspect of this paper is its parallel implementation of the hierarchical clustering method on a distributed memory, multiple instruction stream multiple data stream (MIMD) architecture (Beowulf). Parallel implementations on single instruction multiple data (SIMD) and parameter random access memory (PRAM) architectures have been previously published.
In the presented approach, all points (N) are stored on all processors (P); however, each processor works on only N/P points. Although such data decomposition for matrix operations is commonly used in parallel linear algebra (for example, ScaLAPACK), this is perhaps its first documented use for hierarchical clustering. The parallel implementation is tested for approximately 10,000 data points. The speedup is surprisingly linear, albeit only with 31 percent parallel efficiency. The linearity will also probably break down for more processors. A comparison of the parallel efficiency of this method versus other parallel implementations would be very useful. The authors also do not discuss the issues with load balancing as the number of remaining clusters reduces.
Given the increasing use of Beowulfs as an inexpensive alternative to supercomputers, the parallel approach presented here has value, despite its low parallel efficiency. A closer look at redistributing clusters among processors for efficient load balancing may lead to some improvement.