The widespread success of machine learning has increased both the academic attention given to understanding and refining its methods, and the commercial demand for practitioners. These two trends are often in tension. The formal statistics that underlies these methods is complex and typically addressed to other theoreticians, making it difficult for software practitioners to learn and exploit new techniques on practical problems. Raschka’s volume is an extended tutorial on machine learning written for practicing programmers, but with more theoretical background than many cookbook introductions to statistical topics. It also contains abundant pointers to other materials, including both detailed theoretical discussions and software packages supporting the techniques it introduces.

As the title suggests, the book illustrates these ideas using Python and Python packages, notably scikit-learn. For each new concept, the book walks through a full Python implementation and demonstrates its operation on test data. It thus supports a learner willing to load in the examples (available online from the publisher), and proactively explore their behavior. However, the chapters contain no exercises for the student to suggest directions for exploration.

The introductory chapter distinguishes three categories of machine learning (supervised, unsupervised, and reinforcement learning), defines a machine learning pipeline (data preprocessing, model selection and training, and model evaluation), and some mechanics of using Python.

The overall structure of the remaining chapters follows the three categories in the introduction, starting with (and giving the most attention to) supervised learning.

Within the supervised learning chapters, the next six chapters develop the pipeline from the introduction. Chapters 2 and 3 deal with specific models. The current emphasis in machine learning is on deep learning, a collection of training techniques that have revitalized the venerable neural network model. Chapter 2 introduces the neural network concept, and implements first a perceptron learning algorithm, then the Adaline algorithm as a context for discussing cost functions and the ubiquitous technique of gradient descent. To broaden the scope of tools, chapter 3 reviews some of the methods available in scikit-learn, including logistic regression, support vector machines, decision trees, and *k*-nearest neighbors learning.

Chapters 4 and 5 discuss data preprocessing. Chapter 4 reviews handling missing data and feature selection, and chapter 5 considers methods for dimensionality reduction, introducing principal component analysis (PCA), linear discriminant analysis, and kernel PCA for nonlinear data.

Chapter 6 introduces methods for evaluating and tuning models. Chapter 7 combines the themes of method selection and evaluation with a review of ensemble learning.

The next two chapters illustrate the application of these methods to more realistic problems. Chapter 8 does sentiment analysis of the Internet Movie Database (IMDb) movie review dataset, a much larger and more complex dataset than the toy data used in the previous chapters. It builds a logistic regression classifier for this dataset based on the bag-of-words model, and shows how to process the data incrementally without loading it all into memory. Chapter 9 addresses deployment issues by showing how to use SQLite and Flask to deploy a model as a web application, and illustrates it with the IMDb classifier.

Chapter 10 turns to regression analysis. It applies both linear regression and polynomial regression to the housing dataset from the University of California at Irvine.

Chapter 11 surveys the third category of learning from the introduction, unsupervised learning, with a discussion of *k*-means, hierarchical, and density clustering (applied to toy data).

The last two chapters return to the neural networks with which the book began. Chapter 12 motivates multi-level neural networks by demonstrating their operation on the MNIST dataset of handwritten digits, while chapter 13 discusses how such computations can be parallelized on a graphics processing unit using the Theano package.

The book has an extensive index, but no integrated bibliography.

For a reader with the initiative to explore excursions around a given implementation of an example, this volume would be an ideal self-teaching guide. It would also be helpful in a classroom setting, which should supplement it with specific exercises and student projects. In addition, it is a rich source of pointers both to the more theoretical machine learning literature and to datasets and Python packages that are useful in modern machine learning applications.

More reviews about this item: Amazon, Goodreads