The relational model, which is the foundation of current database management systems (DBMSs), describes information using sets of tuples, so duplicates are not allowed. Nevertheless, relational DBMSs have often been designed to allow duplicates in relations in order to save the cost of duplicate elimination, which is required to implement set operations. Moreover, in order to meet the requirements of some recent applications, new generation database systems have been proposed based on novel data models that explicitly support structures like bags, that is, collections of elements that, unlike sets, may contain duplicates. It follows that a theoretical investigation of languages for manipulating such structures is now an important and challenging issue.
Grumbach and Milo study the expressive power and the complexity of algebraic languages for manipulating nested bags, that is, complex objects constructed using tuple and bag constructs. The algebra they propose is an extension of the algebra for nested relations, which are constructed using tuple and set constructs. In particular, this algebra includes extensions of relational algebra operators (such as selection, projection, and Cartesian product) plus specific operators to create and destroy levels of bag nesting, similar to the nest and unnest operators of the nested relational algebra. One further operator is defined for duplicate elimination, but the authors show that, unlike in formal languages for complex objects without bags, this operator is indeed redundant in this algebra.
The main result of this paper is that algebras for bags are tractable only when the level of nesting is bounded. More specifically, the authors show that when only simple bags and bags of bags are allowed, the complexity of the evaluation of a query expressed in bag algebra is polynomial with respect to the size of the input databases. Conversely, starting with the third level of nesting, the expressive power increases dramatically and the algebra can express queries of hyper-exponential complexity. The authors also investigate the relationship between the complexity of queries and the algebraic operations used.
Although the paper is theoretical and not accessible to practitioners, it is well written and thus easy to read for anyone with a theoretical bent and some interest in the area. It provides further insights into the important issue of the expressive power of complex object database languages, which is crucial in the design of practical query languages for new generation database systems.