Computing Reviews, the leading online review service for computing literature.

Search

Understanding Bloom filter intersection for lazy address-set disambiguation
Jeffrey M., Steffan J. SPAA 2011 (Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures, San Jose, CA, Jun 4-6, 2011)345-354.2011.Type:Proceedings

Date Reviewed: Aug 19 2011

Bloom filters have recently received a significant amount of interest due to their many applications in the large data and analytics spaces. The original use of the Bloom filter (by Bloom and in the years that followed) was in the form of a data structure that could answer queries of the form “Does x exist in the given set S?” Bloom filters can answer such set membership queries very efficiently with a low false positive rate, and with no false negatives. This paper tries to present probabilistic analysis of usage of Bloom filters for set intersection questions (queries of the form “Do the sets S1 and S2 overlap?”) instead of set membership questions. This paradigm shift in the usage of Bloom filters is significant, and has happened even though the probabilistic models and the analysis were presented mainly for set membership queries. The authors consider both the partitioned Bloom filters (in which each hash function maps to a separate partition within the range) and unpartitioned Bloom filters (in which all hash functions map to the entire range). They present some very interesting results. For example, they show that, compared to unpartitioned Bloom filters, partitioned Bloom filters have a lower probability of false positives when one considers set intersection questions. In the introductory section, the authors also present a very thorough overview of the state of the art in the field of parallel processing and concurrency. Though this is interesting, it is not central to the study of Bloom filters. Parallel processing is a broad field, and not all challenges relevant to parallel processing apply to Bloom filters. The empirical results section presents the probability values by Bloom filter length, but stops short of actually recommending the number of hash functions and the filter length depending on the sizes of the sets. This could be considered a slightly negative aspect of this otherwise sound paper. Written in a style that is easy to understand, and using basic concepts in probability, the paper does a good job of deriving the results, and the presented theorems are short and crisp.

Reviewer: Amrinder Arora	Review #: CR139376 (1203-0295)

Computations On Discrete Structures (F.2.2 ... )

Information Filtering (H.3.3 ... )

Would you recommend this review?

yes

Other reviews under "Computations On Discrete Structures":	Date

NP-hard problems in hierarchical-tree clustering Křivánek M., Morávek J. Acta Informatica 23(3): 311-323, 1986. Type: Article	Jul 1 1987

A separator theorem for graphs of bounded genus Gilbert J., Hutchinson J., Tarjan R. (ed) Journal of Algorithms 5(3): 391-407, 1984. Type: Article	May 1 1985

On the computational complexity of path cover problems Ntafos S., Gonzalez T. Journal of Computer and System Sciences 29(2): 225-242, 1984. Type: Article	Jun 1 1986

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy