Searching to retrieve specific records from databases can be slowed by complicating factors. One source of complication addressed in this paper is the query selection criteria when multiple dimensions are involved, a complication made worse when the dimensions use nonkey data. A classic multiple dimension search often used in the literature, and used in this paper, is for a two-dimension retrieval, from a database about hotels, of the set of hotels that are closest to the beach and lowest in price: a so-called skyline retrieval.
This paper starts by reporting, based on a survey of the literature, an evaluation and comparison of prominent algorithms for doing skyline searches. The six algorithms reported on are the divide-and-conquer, the block nested loop, the sort first skyline, the bit map, the index, and the nearest neighbor. The comparison emphasizes progressiveness, absence of false misses, absence of false hits, fairness, incorporation of retrieval references, and universality. Building on the nearest neighbor algorithm, the paper proposes an algorithm it claims is better than the six compared algorithms, one it terms the branch-and-bound algorithm.
The paper explains the branch-and-bound algorithm, and offers an extensive formal proof of its correctness. The paper then briefly examines the effect of additions to and deletions from the searched databases. Following that, the paper identifies six “novel variations” on skyline search to document the superiority of the branch-and-bound algorithm: constrained skylines, ranked skylines, group-by skylines, dynamic skylines, enumerating and K-dominating queries, and skybands.
Only the branch-and-bound is compatible with all six variations, and, on top of that, the paper introduces a new variation, approximate skybands, on which it also shines. The paper then provides an experimental evaluation of the efficiency of the branch-and-bound algorithm compared to the nearest neighbor algorithm (generally, the best of the six prominent algorithms for most skyline searches). For a non-novel search, the branch-and-bound algorithm makes a better showing on each of three rating factors: dimensionality, cardinality, and progressive behavior. On each of the six novel variations, the paper’s branch-and-bound algorithm also winds up working from about as well as to much better than the nearest neighbor algorithm. As the paper points out, saving skylines relevant for frequently asked types of queries could offer important savings for active database users. To me, the branch-and-bound algorithm’s claimed desirable properties suggest that data administrators should investigate the algorithm’s applicability for their own databases.