A two-sided orthogonal range counting query over a finite set S of n points in R2 asks, given a query point q = (qx, qy) ∈ R2, to report the number of points (px, py) ∈ S such that px ≥ qx and py ≥ qy.
The authors present a data structure and algorithm for answering such two-sided orthogonal range counting queries, and then consider cases where the query point q is chosen from a probability measure D over R2, denoting such a query by Q(q,D).
The paper introduces a new data structure called a biased range tree, denoted by T(S,D), which is a comparison tree built in terms of S and D, and augmented with data structures called catalogues. A catalogue associates subsets of S with a node of T(S,D). The memory and preprocessing requirements of a biased range tree are O(n log{n}).
The authors prove the following optimality result for biased range trees: Let t(D,T(S,D)) denote the expected time to answer a query Q(q,D) using T(S,D), and let &mgr;(D,S) be the expected time to answer a query Q(q,D) using the optimal decision tree for S and D. Then, t(D,T(S,D)) is within a constant factor of &mgr;(S,D).
The paper has shortcomings regarding both the strength of the results and the presentation.
The results presented have three limitations:
- (1) A biased range tree is a distribution-sensitive data structure and requires that the distribution be known in advance; a change in the distribution requires rebuilding the biased range tree.
- (2) In the worst case, the depth of T(S,D) is O(log(n)), so the worst-case query time is O(log(n)), which is no better than the time given “by Bentley’s range trees. Range trees use O(n log n) space, [...] can be constructed in O(n log(n)) time,” can answer two-sided (as well as three-sided) range counting queries in O(log(n)) using fractional cascading, and do not depend on the query distribution.
- (3) While the proposed approach can be extended to three-sided and four-sided orthogonal range counting queries, it is not optimal for these queries.
Three examples of issues with the presentation follow:
- (1) The authors reference fractional cascading and k-d trees several times, but do not explain them in sufficient detail, even though that would have greatly helped the reader to understand the proposed biased range tree.
- (2) A rectangle is denoted by both R and r in different parts of section 2, and then the r symbol is used to denote a region in section 3.2.
- (3) The same notation, dT(q), is used for both the expected depth of a node (in section 4) and for an instance of the depth (in section 3.5).
In conclusion, this paper presents an interesting distribution-sensitive “data structure for answering two-sided orthogonal range counting queries,” when the query point is chosen from a known probability measure D. However, the approach is not optimal for three-sided queries, the worst-case query time is the same as that of Bentley’s range trees, and the approach requires knowing the query distribution in advance.