Portable sensor stations allow for the transport of equipment to sites of interest and make it possible to observe ecological phenomena at any desired spatial granularity. In addition, these networks operate at fine time resolution, thus generating a huge amount of data in need of automated data cleaning analysis (for example, remedial actions in case of sensor malfunction or outage). In this paper, the authors consider one such network--the SensorScope Station at the École Polytechnique Fédérale de Lausanne in Switzerland--and in particular focus on air temperature data. The proposed data cleaning is based on a machine learning approach. Moreover, an adaptive quality control system is proposed that exploits both spatial and temporal relationships among multiple sensors in a site.
After section 1’s brief introduction to data analysis problems and challenges in multi-sensor networks, the authors provide a detailed introduction to the SensorScope system and data in sections 2 and 3. This is followed by a discussion of the hybrid Bayesian networks used to model the air temperature data. Such models are very appropriate for the task, as they contain both continuous and discrete variables. Despite the power of such models, they are static and cannot represent the dynamics of transitions from one time slice to another. To incorporate the temporal aspects, the Bayesian model is augmented by adding Markovian lag variables for each true temperature variable, thus changing the model to a dynamic Bayesian network, which is a spatiotemporal model. This is detailed in sections 4 and 5 with great rigor. The learned spatial models are then validated (using a series of leave-one-out prediction tests), and the efficacy of the dynamic Bayesian network model is evaluated in comparison with spatial and temporal quality control models on real data from SensorScope, justifying the pursuit of a spatiotemporal model. In addition, section 6 contains the performance evaluation of the model in terms of type I and II errors through noise-injection experiments. Following section 7’s thorough discussion of related work, the final section (8) presents concluding remarks and future research directions.
There are several interesting points in this paper: the use of hybrid Bayesian networks in modeling the spatial data; the use of dynamic Bayesian networks for complete (spatiotemporal) data modeling; and--perhaps the most important--the formulation of an automated quality control process in the domain of environmental monitoring sensor networks. Even though these models have been successfully applied to air temperature sensor data, they might face unexpected challenges when applied to other kinds of data--for example, wind velocity or soil moisture. However, it is highly likely that they can face these challenges successfully, as their structural flexibility will allow for other types of correlations to be incorporated.