The term “boosting” in the title of this definitive book refers to the authors’ approach to machine learning. Boosting means “creating a highly accurate predictor by combining many weak and inaccurate ‘rules of thumb’” to support such tasks as assigning unfamiliar entities to appropriate categories. Though the book is scholarly and authoritative in the (positive) extreme, there is no hint of turgid scholasticism. The authors eschew esoteric statements made for their own sake, offering an abundance of ingenious, dazzling mathematics (including proofs) instead.
The book introduces the general topic of machine learning “from the ground up,” which provides enough foundation for understanding the content that explains and elaborates boosting. Readers will only require experience with undergraduate probability, calculus, and linear algebra, so the target audience can include students and non-machine-learning specialists. The exercises at the end of each chapter are, I am sure, meant for students and researchers alike. The writing is efficiently expressive and clear, with the subject matter very well sequenced. I found no typographical errors. The authors season the text with well-timed and quite pithy humor: “How is it that a committee of blockheads can somehow arrive at highly reasoned decisions, despite the weak judgment of the individual members?” They answer their own question by claiming that “boosting solves hard machine-learning problems by forming a very smart committee [a collection of weak and inaccurate rules of thumb] of grossly incompetent but carefully selected members.” The generic set of algorithms, to which the authors also refer to as a “meta-algorithm,” is called AdaBoost, with Ada standing for “adaptive” and not for the computer language--a classification conundrum for search engines that I am quite certain AdaBoost can handle! (The authors also present a natural language parsing challenge to AdaBoost in Groucho Marx’s famous joke that “fruit flies like a banana,” which I, as a card-carrying spoiler, would a priori disambiguate via a hyphen: “fruit-flies.” This, of course, is not the point, and besides, hyphenation went out of style long ago.)
I chose to read the appendix of background mathematical information first, and I recommend it, especially to students. It is quintessentially helpful, precise, and accurate, and complete with respect to the content of the book. I most appreciated its rapid but effective buildup to such notions as compactness, convexity, and the central limit theorem. This appendix by itself could serve as an excellent review for the final exam in a class on the calculus of several variables.
Early in the book (p. 10), the authors provide an example of a strong classifier that results from the combination of three weak classifiers. This example is impressive and motivating, with training error dropping exponentially with the number of weak classifiers (a fact that is proved in the book). Since AdaBoost is, loosely speaking, a species of fitting theory to data, it is subject to the universal hazard of overfitting (via a plethora of parameters), with which many of us have had to deal in various scientific activities. The authors’ quantification of Occam’s razor in showing AdaBoost to be resistant to overfitting is in my opinion a tour de force. “Simplicity is not at all simple to define,” the authors note, but the tradeoff is between the simplicity of (the members of) the training set and the fit to the training data. Another tour de force is their demonstration (in chapter 5) that even after the training error is asymptotic in the number of training examples, the “margin” (that is, the confidence level) continues to increase with this number. Larger margins on the training set ensure a lower generalization error.
The essence of game theory is covered very well (chapter 6), and the AdaBoost connection is made with the observation that the boosting and weak learning algorithm is “playing a game repeatedly in a standard game-theoretic fashion.” And in the mind-reading game, the computer wins 87 percent of the games. Connections with, for example, logistic regression and convex optimization are made equally clear and compelling.
This excellent book is a mind-stretcher that should be read and reread, even by nonspecialists.