Many data management and analytics tasks, notably entity resolution, sentiment analysis, and image recognition, cannot always be fulfilled through automated software processes alone, but also require the application of human cognition. Human computation capabilities can be harnessed using crowdsourced platforms. This paper “surveys and synthesizes a [broad range] of existing studies on crowdsourced data management” and then “outlines key factors that [should] be considered to improve crowdsourced data management.”
A major focus of the paper is on three key problems in crowdsourced data management, namely quality control, cost control, and latency control. Quality control covers how to prevent low-quality results, “such as eliminating low-quality workers.” Cost control addresses the issue of how to ensure that costs are not more than necessary to complete the crowdsourcing tasks. One way of doing this is using pruning algorithms to eliminate unnecessary tasks. Latency control discusses strategies for meeting established time constraints, such as pricing.
The paper gives considerable attention to crowdsourced operators that have been proposed to improve real-world applications, including filtering, find, and search operators. “Crowdsourcing systems that integrate [crowdsourced] relational database management systems ... to process computer-hard queries” are discussed. Two crowdsourcing platforms, Amazon Mechanical Turk and CrowdFlower, are examined.
The paper is very thorough, clear, and detailed. Those readers who follow crowdsourced data management should find this paper a very valuable reference.