Forecasting air pollution levels can enable important public health actions such as traffic modification, self-protection steps by individuals at risk, and measures such as burn bans. The authors use this book to document an impressive survey of the recent literature in the field. Through a careful examination of works published from 2011 through 2021, the authors discover and present several high-level characteristics of the research field. Briefly: most studies are connected to countries with high levels of pollution; the number of papers published annually is increasing; air quality index (AQI) and PM2.5 (concentration of particulates smaller than 2.5 microns) are the most common quantities predicted; and in many cases better prediction is achieved by using more sophisticated prediction algorithms.
The authors begin by identifying 781 papers using keyword search. After filtering for language (English) and verifying that the paper topic is related to the prediction of future air quality using machine learning, they consider 155 papers further. Most were authored by researchers from China and India--this is likely connected to the high levels of pollution in many cities in these nations.
For review, the publications are grouped by the type of machine learning algorithm used. Traditional regression and its variants (ARIMA), multilayer perceptron (MLP) models, long short-term memory (LSTM) recursive neural networks (RvNNs), decision trees and random forests, and other categories of regression (as opposed to classification) methods are all considered. Specific papers are briefly discussed to illustrate novel approaches. For example, the authors mention a study that applied a Kalman filter in the learning phase to enhance the performance of a classic fully connected neural network for time-based prediction.
The authors note that several potential predictors are frequently used. They consider predictor variables as being one of three types: pollutant variables (measured concentrations of certain pollutants), weather variables such as wind and precipitation, and other variables including season, location, and time of day. They find that, in general, pollutant variables are better predictors than weather variables, and that all pollutants used are roughly equally predictive of AQI. Temperature, humidity, and wind speed are often useful in prediction, as are month and hour of the day. It should be noted that these conclusions are dependent on the methodologies used for variable selection in the individual studies.
While the work of surveying and summarizing this large body of work is impressively done, there are a few places that could be improved. A survey paper should give the reader some sense of the performance that researchers in the field are able to achieve. There is no such statement. In fact, only a very few accuracy numbers are quoted in the entire paper. Some quantitative evaluation of the work is needed. Additionally, the important issue of how each paper treats time is missing. There is no documentation of whether air quality prediction is done one day ahead of time (using data from days N-2, N-1, N to predict AQI on day N+1) or via some other prediction interval. It would be helpful to know how different researchers handle this important topic.
Air quality forecasting is an application area for machine learning that directly impacts people’s lives and is well suited to the technology. The authors have done a good job of surveying the literature. This paper is important for understanding the scope of existing work and the most promising directions for new research.