Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Answering enumeration queries with the crowd
Trushkowsky B., Kraska T., Franklin M., Sarkar P. Communications of the ACM59 (1):118-127,2016.Type:Article
Date Reviewed: May 3 2016

Human crowds are valuable assets for providing additional responses in real time to cognate query results derived solely from relational database management systems (RDBMSs). But how should query results from human crowds, designed to simultaneously augment database query results, be terminated to provide reliable responses? Trushkowsky and colleagues offer statistical tools for users and developers of RDBMSs to use in scrutinizing the time and cost benefits of the accuracy and inclusiveness of query responses.

In an effort to cover the inclusiveness of all behavioral groups, the size of a query result (cardinality) is useful for computing the percentage of each interest survey group. The authors compellingly introduce an effective power law distribution data model that helps to overcome the data sampling problems attributable to cultural and regional biases, and a variety of the knowledgeable uses of web search strategies.

Clearly, the paper introduces and evaluates a metric for estimating the stable and convergent cardinality of “human intelligence tasks from Amazon’s mechanical Turk (HIT-AMT).” The authors introduce algorithms that help to minimize the influence of individuals who might dominate and bias query responses. They present the concepts of the classes of distributions of coverage and variance of user responses to crowdsourced queries. The experimental results of the test statistics with several thousands of queries in HIT with the AMT RDBMS of United Nations and US data show some significant improvement over well-known studies.

List walking is a situation when the total size of a query result is underpredicted due to multiple heavily skewed, or similar, survey responses. The authors propose and validate a heuristic binomial probabilistic algorithm to detect and overcome list walking. The algorithms successfully detected severe list walks in the United Nations database.

Undoubtedly, the authors present algorithms for computing cost-benefit tradeoffs of generating precise accounts and estimates of query responses derived from the traditional and real-time crowdsourced RDBMSs. There is no doubt that users should be empowered to contribute and reason about query results in all relational database management searches and retrieval results. In spite of the new light shed on the applications of the well-known power-laws [1] and the binomial distribution in this paper, I encourage all statisticians and database specialists to read and address the outstanding remaining unanswered questions raised by the authors. Is there a clear distinction between operations such as SELECT, JOIN, and PROJECT versus relational operators on real query results? What impacts do human behaviors have on the sampling process in crowdsourced queries?

Reviewer:  Amos Olagunju Review #: CR144381 (1607-0526)
1) Gunther, R.; Levitin, L.; Shapiro, B.; Wagner, P. Zipf's law and the effect of ranking on probability distributions. International Journal of Theoretical Physics 35, 2(1996), 395–417.
Bookmark and Share
  Reviewer Selected
Featured Reviewer
Information Theory (H.1.1 ... )
Human Factors (H.1.2 ... )
Natural Language Processing (I.2.7 )
Would you recommend this review?
Other reviews under "Information Theory": Date
A generalized class of certainty and information measures.
van der Lubbe J., Boekee D. Information Sciences 32(3): 187-215, 1984. Type: Article
Jan 1 1985
Information in the enterprise
Darnton G., Giacoletto S., Digital Press, Newton, MA, 1992. Type: Book (9780131761735)
Sep 1 1993
Information theory for information technologists
Usher M., Macmillan Press Ltd., Basingstoke, UK, 1984. Type: Book (9789780333367032)
Sep 1 1985

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy