This is one of the most enlightening books about the hidden risks of applying machine learning techniques, and particularly algorithms, in decision making. The book unveils the many potential sources of algorithmic bias, raising serious questions about the merits of machine learning algorithms and tools.
In fact, the book puts machine learning algorithms, which are meant to improve traditional statistical modeling techniques, into the proper context. This is framed by the many sources of bias, ranging from human cognitive and real-world biases, to the data scientist as statistical modeler’s own biases, to biases introduced by data and algorithm type. The field faces tough challenges ahead, for instance, making sure that an algorithm is not biased, or how to de-bias an algorithm.
Above all, however, the toughest challenge ahead for the field is how to eschew the decline of trust in artificial intelligence (AI) and machine learning, that is, the inability to explain data quality or the genesis of the statistical model, and the way machine-learning–based decisions should be taken into consideration and (sometimes) questioned.
The book does not assume deep mathematical knowledge or a statistics background. Wherever possible, a brief and simple introduction is used for statistical terms (for example, confidence level, linear or nonlinear model), all accompanied by nicely selected examples. The book is mainly organized into four parts, with the first one introducing the reader to the world of biases, the model development process, and machine learning. It is clear from the beginning that bias is a deeply anchored human brain feature and hard to avoid in whatever artifacts, such as statistical models analyzing the real world. Machine learning is viewed as an effective tool to develop more accurate statistical models, which are otherwise impossible to do with only human labor; however, bias can easily be inflicted and, worse, left unnoticed if all is treated as a black box.
The second part of the book is devoted to a careful categorization of all sorts of biases and their respective sources: real world, data scientists and modelers, data, and algorithms. Thankfully, not all perspectives are doom and gloom; the author devotes the last two chapters to what should be done from the user and data scientist perspectives.
The book is highly recommended to business analysts, economists, and data scientists, as well as social scientists and psychologists alike. It can also be used in postgraduate and undergraduate courses in computer science, in relation to data science degrees, and anywhere algorithms are being taught as part of a curriculum.