Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
A parallel algorithm for record clustering
Omiecinski E., Scheuermann P. ACM Transactions on Database Systems15 (3):599-624,1990.Type:Article
Date Reviewed: Nov 1 1992

We start with a large file and a collection of queries to be processed against that file. We assume that the frequency of occurrence of each query and the records that satisfy it are known. The goal is to cluster the records in such a way as to minimize the total number of pages accessed in order to satisfy the collection of queries. In this paper, the authors present a heuristic algorithm whose output is a list of clusters containing the records satisfying different subsets of queries. The general problem is NP-hard, so this algorithm is heuristic and is only near-optimal.

Building on earlier work, the authors use what they call a P-tree, a binary tree whose nodes at level i represent the clusters formed after the first i queries using a precedence order. Because the formation of this tree requires little and regular communication among nodes, it is suitable for execution on a SIMD computer. Although the authors’ model of a SIMD machine is somewhat unusual--the processors are only assumed to have communication paths that link them into a linear array--the model is subsumed by most extant designs and thus is not unrealistic.

The authors have done a number of synthetic experiments testing their algorithm against an adaptive clustering algorithm. They present the conclusion that their method is competitive.

Reviewer:  D. A. Buell Review #: CR115118
Bookmark and Share
 
Clustering (H.3.3 ... )
 
 
Sequencing And Scheduling (F.2.2 ... )
 
 
Single-Instruction-Stream, Multiple-Data-Stream Processors (SIMD) (C.1.2 ... )
 
 
Information Storage (H.3.2 )
 
 
Physical Design (H.2.2 )
 
Would you recommend this review?
yes
no
Other reviews under "Clustering": Date
Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases
Can F. (ed), Ozkarahan E. ACM Transactions on Database Systems 15(3): 483-517, 1990. Type: Article
Dec 1 1992
Organization of clustered files for consecutive retrieval
Deogun J., Raghavan V., Tsou T. ACM Transactions on Database Systems 9(4): 646-671, 1984. Type: Article
Jun 1 1985
Clustering analysis for graphs with multiweighted edges: a unified approach to the threshold problem
Roy J. Journal of the American Society for Information Science 38(3): 156-161, 1987. Type: Article
Apr 1 1988
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy