Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Analytic queries over geospatial time-series data using distributed hash tables
Malensek M., Pallickara S., Pallickara S. IEEE Transactions on Knowledge and Data Engineering28 (6):1408-1422,2016.Type:Article
Date Reviewed: Jul 21 2016

This paper explores the advantages and complications of using a novel indexing scheme for managing big data and geospatial attributes. The indexing system is built with the goal of an optimal compromise among scalability, expressiveness, and performance. Its implementation conflates the domains of indexing, content storage, and caching to facilitate queries of an exploratory (learning about the domain) or predictive (making forecasts) nature. The experimental sample consists of meteorological observations from the National Oceanic and Atmospheric Administration (NOAA) comprising one petabyte of data in 20 billion files.

The authors go to great lengths in explaining how to maintain current aggregate data statistics within the cached indexing information, and how the query model needs to be tailored to what the indexing scheme offers. It also provides benchmark timing for the maintenance of the index under selected operations.

The conflation of an indexing scheme based on distributed hashing with semantic information may appear questionable from a traditional perspective, but could be the correct approach for solutions tailored to certain big data domains. This approach is specific (for instance, data points associated with a location have commonality of meaning: all are weather observations); its application to more disparate (for example, unrelated) aggregations of data will have limits. Big data, however, often sacrifices generality for performance and this indexing scheme follows that pattern. Query execution necessarily happens in a distributed evaluation environment like MapReduce, which requires custom code and specialized expertise. Notwithstanding the possibility of apples-and-oranges constraints, the reader may have benefited from a missing comparative performance assessment against more traditional environments, perhaps on a reduced dataset.

Reviewer:  A. Squassabia Review #: CR144614 (1611-0836)
Bookmark and Share
  Featured Reviewer  
 
Data Models (H.2.1 ... )
 
 
Distributed Databases (H.2.4 ... )
 
 
Hash-Table Representations (E.2 ... )
 
 
Spatial Databases And GIS (H.2.8 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Data Models": Date
A transient hypergraph-based model for data access
Watters C., Shepherd M. ACM Transactions on Information Systems 8(2): 77-102, 2001. Type: Article
Jun 1 1991
Toward a unified framework for version modeling in engineering databases
Katz R. ACM Computing Surveys 22(4): 375-409, 2001. Type: Article
Feb 1 1993
Graph data model and its data language
Kunii H., Springer-Verlag New York, Inc., New York, NY, 1990. Type: Book (9780387700588)
Dec 1 1991
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy