Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Scalable community-driven data sharing in e-science grids
Scholl T., Bauer B., Gufler B., Kuntschke R., Reiser A., Kemper A. Future Generation Computer Systems25 (3):290-300,2009.Type:Article
Date Reviewed: May 19 2009

Science research is generating extremely large amounts of data--running into petabytes per year--often across distributed locations. At the same time, a large amount of science research is being done across the world with data that has been collected elsewhere. This brings up the issue of effective access to and organization of such data. The increasingly high degree of collaborative research makes this an even more daunting task.

This paper extends the existing framework for sharing and load balancing in this context, with locality-aware allocation of data. This exploits the spatial locality usually observed in data requests.

The HiSbase system that implements this technique uses z-quadtrees to capture spatial locality and to provide efficient access. Z-quadtrees are extensions of quadtrees that use histograms, distributed hash tables, and space-filling curves for linearization. For distributed hash tables, a system called Pastry is used. Histograms are used for capturing locality, and are obtained through training with sample data. The system targets communities with “fairly stable data distributions,” making static training feasible.

The paper covers a wide range of issues relating to the problem of storage, including rate of change of distribution, data skew, implementation of z-quadtrees, node management, data distribution, reliability of nodes, evaluation parameters, and so on. Performance analysis is reported for some large datasets running into millions of records.

The paper reads well and will be of interest to those working with distributed databases, particularly with spatial data, or using data grids.

The lack of focus of the paper--addressing a gamut of issues, instead of focusing on some core elements--makes the paper quite sketchy; a number of questions remain unanswered. At the same time, a lot of familiarity with the work in progress is assumed on the part of the reader, as implied by the lack of adequate background descriptions. Most of the figures are cited without adequate explanation, which makes them difficult to understand.

Reviewer:  M Sasikumar Review #: CR136851 (1001-0077)
Bookmark and Share
  Featured Reviewer  
 
Data Sharing (H.3.5 ... )
 
 
Distributed Databases (H.2.4 ... )
 
 
Query Processing (H.2.4 ... )
 
 
Database Applications (H.2.8 )
 
Would you recommend this review?
yes
no
Other reviews under "Data Sharing": Date
Issues in online database searching
Tenopir C., Libraries Unlimited, Inc., Englewood, CO, 1989. Type: Book (9789780872877092)
Aug 1 1990
Sharing scientific data
Sterling T., Weinkam J. Communications of the ACM 33(9): 112-119, 1990. Type: Article
Mar 1 1991
Data caching issues in an information retrieval system
Alonso R., Barbara D., Garcia-Molina H. ACM Transactions on Database Systems 15(3): 359-384, 1990. Type: Article
Mar 1 1991
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy