Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Understanding Bloom filter intersection for lazy address-set disambiguation
Jeffrey M., Steffan J.  SPAA 2011 (Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures, San Jose, CA, Jun 4-6, 2011)345-354.2011.Type:Proceedings
Date Reviewed: Aug 19 2011

Bloom filters have recently received a significant amount of interest due to their many applications in the large data and analytics spaces. The original use of the Bloom filter (by Bloom and in the years that followed) was in the form of a data structure that could answer queries of the form “Does x exist in the given set S?” Bloom filters can answer such set membership queries very efficiently with a low false positive rate, and with no false negatives.

This paper tries to present probabilistic analysis of usage of Bloom filters for set intersection questions (queries of the form “Do the sets S1 and S2 overlap?”) instead of set membership questions. This paradigm shift in the usage of Bloom filters is significant, and has happened even though the probabilistic models and the analysis were presented mainly for set membership queries.

The authors consider both the partitioned Bloom filters (in which each hash function maps to a separate partition within the range) and unpartitioned Bloom filters (in which all hash functions map to the entire range). They present some very interesting results. For example, they show that, compared to unpartitioned Bloom filters, partitioned Bloom filters have a lower probability of false positives when one considers set intersection questions.

In the introductory section, the authors also present a very thorough overview of the state of the art in the field of parallel processing and concurrency. Though this is interesting, it is not central to the study of Bloom filters. Parallel processing is a broad field, and not all challenges relevant to parallel processing apply to Bloom filters.

The empirical results section presents the probability values by Bloom filter length, but stops short of actually recommending the number of hash functions and the filter length depending on the sizes of the sets. This could be considered a slightly negative aspect of this otherwise sound paper.

Written in a style that is easy to understand, and using basic concepts in probability, the paper does a good job of deriving the results, and the presented theorems are short and crisp.

Reviewer:  Amrinder Arora Review #: CR139376 (1203-0295)
Bookmark and Share
  Reviewer Selected
 
 
Computations On Discrete Structures (F.2.2 ... )
 
 
Information Filtering (H.3.3 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Computations On Discrete Structures": Date
NP-hard problems in hierarchical-tree clustering
Křivánek M., Morávek J. Acta Informatica 23(3): 311-323, 1986. Type: Article
Jul 1 1987
A separator theorem for graphs of bounded genus
Gilbert J., Hutchinson J., Tarjan R. (ed) Journal of Algorithms 5(3): 391-407, 1984. Type: Article
May 1 1985
On the computational complexity of path cover problems
Ntafos S., Gonzalez T. Journal of Computer and System Sciences 29(2): 225-242, 1984. Type: Article
Jun 1 1986
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy