Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Complexity and expressive power of weakly well-designed SPARQL
Kaminski M., Kostylev E. Theory of Computing Systems62 (4):772-809,2018.Type:Article
Date Reviewed: Sep 10 2018

The resource description framework (RDF) is a data model standard for the web that represents linked data as subject-predicate-object triples. Such triples can be naturally represented as labeled directed graphs. SPARQL is a query language for RDF graphs that plays an important role in the semantic web. The subject of intensive theoretical investigation over the last decade, its query engines have been actively developed for both academic and industrial applications.

An example of a SPARQL query that retrieves IDs (?i) and names (?n) from a given graph:

SELECT ?i, ?n
WHERE (?i, rdf:type, foaf:person) AND (?i, foaf:name, ?n).

If the graph does not contain information about a particular person’s name, the query does not return anything about that person. However, one might still want to retrieve his/her ID. For this purpose, optional queries can be used, for example:

SELECT ?i, ?n
WHERE (?i, rdf:type, foaf:person) OPT (?i, foaf:name, ?n),

where OPT indicates that name retrieval is optional. That is, if the graph contains corresponding information, the person’s ID will be returned together with his/her name; if not, only the ID will be returned and the variable ?n will be left undefined in the answer.

Optional queries are very natural and useful, taking into account “the fundamental incompleteness of the web.” However, they are “computationally expensive--query answering is PSPACE-complete.” In the search for computationally more efficient fragments of optional queries, Pérez et al. introduced a well-designed fragment of SPARQL [1]. This fragment requires that “each variable in the optional argument of an OPT expression should either appear in the mandatory argument ... or appear nowhere outside of the argument.” Evaluation of well-designed queries is coNP-complete (when all the variables are selected). This class enjoys other nice properties, too.

A natural question arises: how representative is the class of well-designed queries? Picalausa and Vansummeren’s analysis of practical SPARQL queries shows that they are common, but not exclusive: about half of all queries with OPT are not well designed [2].

This observation motivates another question: can one relax the restrictions of the well-designed fragment to accommodate more of the queries arising in practice, while preserving good computation properties? The authors answer this question positively. They identify a new “class of weakly well-designed queries that subsumes well-designed queries.” Weakly well-designed queries relax the variable occurrence restriction of well-designed queries so that “variables from the optional side of an OPT subpattern that [do not appear in] the mandatory side [may] occur in certain positions outside of the subpattern.” Examples of weakly well-designed (but not well-designed) queries:

(1)
SELECT ?i, ?n
WHERE (?i, rdf:type, foaf:person) OPT (?i, foaf:name, ?n) OPT (?i, v_card:name, ?n),

in which the variable ?n appears in two unrelated optional parts; and

(2)
SELECT ?i, ?n
WHERE (?i, rdf:type, foaf:person) OPT (?i, foaf:name, ?n) FILTER (¬bound(?n) ∨ ¬(?n = Ana)),

in which the variable ?n from the optional part appears in the constraint specified by FILTER.

The major result of the paper states that the query evaluation complexity for both weakly well-designed queries and well-designed queries is the same: coNP-complete. Moreover, extending the queries by UNION and projection does not affect evaluation complexity. Besides, it turns out that weakly well-designed queries are very common in practice, for instance, they make up 99 percent of all DBpedia queries (https://wiki.dbpedia.org/).

While the technical part of the paper is quite involved, the explanations are clear and the well-chosen examples are helpful.

Reviewer:  Temur Kutsia Review #: CR146237 (1812-0647)
1) Pérez, J.; Arenas, M.; Gutierrez, C. Semantics and complexity of SPARQL. ACM Transactions on Database Systems 34, 3 (2009), Article No. 16 .
2) Picalausa, F.; Vansummeren, S. What are real SPARQL queries like?. In Proceedings of the International Workshop on Semantic Web Information Management ACM, 2011, Article No. 7.
Bookmark and Share
 
Query Languages (H.2.3 ... )
 
 
General (F.0 )
 
Would you recommend this review?
yes
no
Other reviews under "Query Languages": Date

Zhang Z., Mendelzon A.Type: Article
Jan 1 1986
Negation in rule-based database languages
Bidoit N. Theoretical Computer Science 78(1): 3-83, 1991. Type: Article
Oct 1 1992
A query language for retrieving information from hierarchical text structures
MacLeod I. The Computer Journal 34(3): 254-264, 1991. Type: Article
Sep 1 1992
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy