Crowdsourcing involves the production of crowd-generated data. These might be data collected during crises, such as a large set of location-based tweets during a firestorm or severe flooding, or numerous SMS reports to a common system within a war zone. They also include marketplace tasks, such as those easily accessed on Amazon’s Mechanical Turk (MTurk) (http://mturk.com). Human intelligence is necessary to clean, label, or tabulate data in preparation for effective data mining. This involves human intelligence tasks (HITs), or as displayed on the MTurk logo, “artificial artificial intelligence.”
The authors provide a compelling and informative description of the processes involved in the collection and preparation of crowdsourced data for effective deployment. I like the focus on developing response systems to emergency or crisis situations. They also discuss the process for data mining crowd sentiment, for example, for commercial applications such as original t-shirt design (http://www.threadless.com/).
I was especially interested in the four sections on preparing data for mining, mining data from crowdsourcing, crowdsourced data for response coordination, and when not to crowdsource. These are central to the paper and clearly present the processes for using the data effectively to potentially save lives and respond with improved accuracy in disaster scenarios.
This paper would be of interest to those working on data projects or systems development where accurate categorization, tabulation, and noise reduction are required. It would also be useful to help determine if crowdsourcing is appropriate for a particular project. As the authors note, “The wisdom of the crowd allows for more accurate categorization than any other machine learning algorithm.”