Truth discovery in noisy social sensing data represents a challenging task. This paper presents a maximum likelihood estimation algorithm that computes the probability that a given measurement is true based on sensory data collected through human tasks. The authors describe numerous application scenarios, such as geotagging campaigns for which people use sensors and collect data for mutual interest.
The central focus of the paper is on associating true or false values with observations given only the measurements that are sent, without having any a priori knowledge about the sources sending the sensor data. The proposed expectation maximization (EM) algorithm finds the maximum likelihood estimation of parameters using an incomplete data statistical model. The EM algorithm uses an observation matrix of social sensing data as input, and yields the maximum likelihood of each participant’s reliability, together with the variable correctness (true or false).
For the evaluation, the authors compare the above approach with Bayesian interpretation and three other fact-finder schemes, varying the number of participants and the number of observations per participant, respectively. The estimation accuracy is highest for the EM algorithm compared with all other approaches. Furthermore, the geotagging case study in which participants visiting a park were asked to geotag and report the location of litter, possibly misinterpreting litter and location, provides highly relevant evidence that EM finds litter patterns with the greatest accuracy. Another example considers ten events during Hurricane Irene reported by the media via Twitter. Here, the results demonstrate the value of the EM algorithm by reporting all ten events correctly from a large volume of noisy data (600,000 analyzed tweets).
The main contribution of this paper is its proof of the accuracy of the EM algorithm for identifying reliable information from social sensing data.