Recently, I found myself giving an impromptu book review to someone in the bookshop near to my office. I noticed that a man was browsing through a book on machine learning that I had purchased a few months ago. I won’t mention the name of the book here, but I did a terribly un-English thing and struck up a conversation with the potential buyer. I told him that I wouldn’t recommend that particular book. “So, what would you recommend then?” was his reply.
There is certainly no shortage of books to choose from on the subject of machine learning. Over the past few years, machine learning has gained traction in both academia and industry, thanks in part to the rise of big data, distributed computing architectures, and more powerful graphical processing units. Alongside this rise in interest in machine learning has been a rise in the number of books published aiming to provide a one-stop guide to the field. Introduction to statistical machine learning by Masashi Sugiyama is another addition to this increasingly crowded landscape; but what, if anything, sets it apart from the excellent titles already available?
Sugiyama’s book is reassuringly weighty, clocking in at 500 pages. I am always wary of books that are advertised as textbooks on machine learning but only run to 150 or so pages. Bigger is not always better, but the length of the book does give a good indication of the breadth of topics covered. The book is split into five parts with Part 1 being a standard introduction and the second part being given over to mathematical preliminaries. This is undeniably useful if the text is to be used as a single reference volume (for example, as a course textbook), but it does mean that you must wait until page 113 for the discussion of machine learning algorithms to begin.
The description of machine learning methodologies is split into three sections: generative models, discriminative models, and further topics. Splitting the discussion of supervised machine learning into generative and discriminative models is a useful approach as it allows for the relative strengths and weaknesses of each to be clearly delineated. The generative models chapters are built upon the foundation of maximum likelihood estimation (MLE), with the first few chapters explaining MLE and its properties and the latter chapters presenting Bayesian approaches. The discriminative models chapters are concerned with function approximation methods for supervised learning and follow the now standard path of linear models to least squares to regularized methods before looking at more powerful approaches such as support vector machines and structured classification. Both sections follow a logical ordering, with topics being introduced in earlier chapters and then built upon and expanded in later chapters. They also go into a good level of detail and the chapters are written in a succinct style, which does mean that you will often need to spend time unpicking what has been shown to fully understand the flow of the description. The final part, “Further Topics,” covers methods that do not neatly fit into the generative/discriminative dichotomy of supervised learning algorithms, such as clustering, dimensionality reduction, and multi-task learning.
To summarize, on the positive side, this book covers a wide range of topics and is organized logically with smatterings of MATLAB examples throughout. But what about the negative side? My criticisms of this book are broadly centered around two aspects that it lacks: coherence and consistency. The preface to the book admits that the book is an amalgam of chapters taken from books written by the author in Japanese (along with some new chapters). In theory, this is an excellent idea; the author has published widely in machine learning within Japan and so to be able to bring his work together as a whole in English would be a real contribution to the field. However, at times this amalgamation of sources leads to a lack of coherence between chapters. If you sit down and read this book from A to Z, it has the feeling of a collection of lecture notes with no coherence linking the chapters together. One of the reasons for this is the second “C”: consistency. Not all of the chapters are laid out in the same way. All start with a sometimes-brief introduction before diving into the topic in question. Some include summary or conclusion sections at the end; some don’t. Some chapters provide references to the key topics for further reading; some don’t. The upshot of this is that the book feels as though it is unfinished or not properly edited. Some chapters abruptly end with no concluding remarks or statements. It would have been nice to end every chapter with a brief summary and a placing of the topic covered within the wider setting of the book. This small addition would have greatly improved the coherence of the book as it would allow readers to review what they have just read, place it within the wider context of machine learning, and then smoothly transition to the next chapter.
After reading this book, I found myself thinking of the encounter I had in the bookshop. Would this book be one that I would recommend? If you already have a good grasp of machine learning and you want to go deeper into many of the subjects, then I would certainly recommend looking at this book as a reference text. The lack of coherence and consistency would not be such detrimental factors if this book were to be used as a reference source that you could dip into. However, if you want to buy only one book on machine learning, then my answer to the question would be no, I wouldn’t recommend this book. If you want one book--and only one book--on the subject of machine learning, then I would recommend Bishop’s Pattern recognition and machine learning  for those wanting a more theoretical approach, or Marsland’s Machine learning: an algorithmic perspective  for those wanting a more practical approach. This is, coincidentally, the exact advice I gave to the man in the bookshop.