Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Decomposing federated queries in presence of replicated fragments
Montoya G., Skaf-Molli H., Molli P., Vidal M. Journal of Web Semantics42  1-18,2017.Type:Article
Date Reviewed: Oct 25 2017

Linked data means that data stored in heterogeneous and autonomous information sources can be integrated, making the information more valuable than what could be obtained from isolated sources. Aggregation of related data offers value; it can provide additional knowledge not available in individual sources. Linked data sources offer resource description framework (RDF) data by means of SPARQL endpoints that can be queried with the SPARQL query language.

The problem of decomposing queries in distributed environments is based on the information integration problem; thus, it is not new. However, the increasing relevance of the linked open data (LOD) cloud poses new challenges to the information integration community: the data model is different and the federation gets to its maximal expression because data sources are completely autonomous from the agent in charge of query distribution.

This paper tackles query distribution in LOD data sources when there are replicated data fragments. The particularity of the replication problem in the LOD context is that data fragmentation and replication cannot be designed in advance to obtain better performance when querying the data sources. Moreover, the availability of sources is unpredictable.

The query decomposition problem is treated in this paper, and a solution to query decomposition with fragment replication (QDP-FR) is offered. It is called LILAC (SPARQL query decomposition against federations of replicated data sources). Its main components are four algorithms: a decompose algorithm, a reduceunions algorithm, a reducebgps algorithm, and an increaseselectivity algorithm. They locate the relevant sources (that is, select nonredundant sets of fragments and candidate endpoints) and join the relevant fragments obtained from the different sources.

The paper presents the problem and formalizes it. Then, the LILAC solution is proposed. The algorithms that constitute LILAC are formalized, their complexity is measured, and proofs of theorems are presented. A validation with experiments on four real datasets and one synthetic dataset is included. In these experiments, the performance of LILAC in two query engines, FedX and ANAPSID, is compared with the performance of other competitors. Performance is measured in terms of execution time, answer completeness, and number of transferred tuples.

This is a sound paper, of interest to the community of researchers working with information integration in linked data environments. I particularly appreciated the “Related Work” section, which proves to be an admirable effort in comparing the problem with other known problems (and solutions), such as distributed databases, data fragmentation, and data replication.

Reviewer:  Mercedes Martínez González Review #: CR145616 (1712-0817)
Bookmark and Share
 
Query Processing (H.2.4 ... )
 
 
World Wide Web (WWW) (H.3.4 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Query Processing": Date
A correction of the termination conditions of the Henschen-Naqvi technique
Briggs D. Journal of the ACM 31(4): 711-719, 1984. Type: Article
Sep 1 1992
A compression technique to materialize transitive closure
Jagadish H. (ed) ACM Transactions on Database Systems 15(3): 558-598, 1990. Type: Article
Oct 1 1992
Efficient and optimal query answering on independent schemes
Atzeni P. (ed), Chan E. Theoretical Computer Science 77(3): 291-308, 1990. Type: Article
Nov 1 1991
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud, Inc.®
Terms of Use
| Privacy Policy