Barber’s aim for this book is to introduce Bayesian reasoning and machine learning to students “without a firm background in statistics, calculus, or linear algebra.” To achieve this goal, the author uses graphs to illustrate the interdependence of variables, as opposed to simply showing the equations relating the variables, and includes a large number of examples and a comprehensive set of MATLAB tools and demo programs. The MATLAB material is available online.

The book has five parts, on inference in probabilistic models, learning in probabilistic models, machine learning, dynamical models, and approximate inference. The author provides a table showing how the material can be used for courses in graphical models (mostly Parts 1 and 2), machine learning, approximate inference, time series, and probabilistic modeling. Each part begins with a chapter outlining the contents of the part, and each chapter begins with an introduction to its contents. This leads to an occasional use of a term prior to its definition, but the index resolves these issues quite easily. Each chapter is organized so that the more challenging material appears at the end, allowing the reader to skip ahead to the next chapter if so inclined. Exercises appear at the end of each chapter.

There are a number of consistent patterns that underlie most of the material in the book, and even when not explicitly mentioned, they serve to unify the material. The author is careful to discuss techniques that can be applied to find approximate solutions to formally intractable problems. For example, in computations with dependency graphs that have loops, long loops have small effects. The trick is to find an appropriate maximal spanning tree and then use tree-based methods. A large part of the book is devoted to algorithms for approximations.

The other recurring theme is that of likelihood maximization, the process of finding the most likely explanation for an observed outcome. Determining the parameters of a hidden Markov model is a typical example of this. The expectation maximization algorithm is frequently used in this context. The book discusses a wide range of algorithms that can be used to maximize likelihood. A motivation is given for using each algorithm, followed by examples, and then the algorithm itself is nicely displayed in the text, accompanied by MATLAB code in many cases.

The numerous examples range from toy examples used to introduce specific concepts to larger-scale practical examples of techniques such as face recognition with real applications. Many are illustrated with gray-scale or color diagrams. I particularly liked the example of “localizing the burglar” in the discussion of hidden Markov models because the author’s pictures really allowed me to “see” that algorithm at work.

Part 1 describes the fundamentals of Bayesian inference, with particular emphasis on finding tractable models in cases where the initial model suggests unfeasible computation. This part serves to introduce the author’s graphical approach.

Part 2 describes how statistical methods can be used in learning, including concepts and algorithms concerning maximizing likelihood. In Part 3, the author introduces machine learning concepts and describes the models that are used in machine learning. Topics include nearest neighbor classification, linear dimension reduction, and linear models. Part 4, on dynamical models, discusses the situation where the underlying parameters of the system are subject to change. The discussions cover Markov models and switching linear systems. Part 5 takes up the important issue of producing good samples from a preassigned distribution and applications to inference.

This is a very comprehensive textbook that can also serve as a reference for techniques of Bayesian reasoning and machine learning. It offers a comprehensive list of references, and provides numerous pointers to references in the body of the text.

Will this book be accessible to students without a firm background in statistics, calculus, and linear algebra? The necessary statistical material is reviewed in the body of the book, and there is an appendix on calculus and linear algebra. The book makes extensive use of calculus, particularly Lagrange multipliers for optimization and linear algebra for the material on dimension reduction. The student without a firm background in the above-mentioned topics will need either to take material on trust or work quite hard with a sympathetic instructor. That being said, I highly recommend this book both to those who will encounter its contents for the first time and to anyone seeking a handy reference for Bayesian reasoning and machine learning.