Computing Reviews

Analytic queries over geospatial time-series data using distributed hash tables
Malensek M., Pallickara S., Pallickara S. IEEE Transactions on Knowledge and Data Engineering28(6):1408-1422,2016.Type:Article
Date Reviewed: 07/21/16

This paper explores the advantages and complications of using a novel indexing scheme for managing big data and geospatial attributes. The indexing system is built with the goal of an optimal compromise among scalability, expressiveness, and performance. Its implementation conflates the domains of indexing, content storage, and caching to facilitate queries of an exploratory (learning about the domain) or predictive (making forecasts) nature. The experimental sample consists of meteorological observations from the National Oceanic and Atmospheric Administration (NOAA) comprising one petabyte of data in 20 billion files.

The authors go to great lengths in explaining how to maintain current aggregate data statistics within the cached indexing information, and how the query model needs to be tailored to what the indexing scheme offers. It also provides benchmark timing for the maintenance of the index under selected operations.

The conflation of an indexing scheme based on distributed hashing with semantic information may appear questionable from a traditional perspective, but could be the correct approach for solutions tailored to certain big data domains. This approach is specific (for instance, data points associated with a location have commonality of meaning: all are weather observations); its application to more disparate (for example, unrelated) aggregations of data will have limits. Big data, however, often sacrifices generality for performance and this indexing scheme follows that pattern. Query execution necessarily happens in a distributed evaluation environment like MapReduce, which requires custom code and specialized expertise. Notwithstanding the possibility of apples-and-oranges constraints, the reader may have benefited from a missing comparative performance assessment against more traditional environments, perhaps on a reduced dataset.

Reviewer:  A. Squassabia Review #: CR144614 (1611-0836)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy