Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Boosting : foundations and algorithms
Schapire R., Freund Y., The MIT Press, Cambridge, MA, 2012. 496 pp. Type: Book (978-0-262017-18-3)
Date Reviewed: Dec 26 2012

The term “boosting” in the title of this definitive book refers to the authors’ approach to machine learning. Boosting means “creating a highly accurate predictor by combining many weak and inaccurate ‘rules of thumb’” to support such tasks as assigning unfamiliar entities to appropriate categories. Though the book is scholarly and authoritative in the (positive) extreme, there is no hint of turgid scholasticism. The authors eschew esoteric statements made for their own sake, offering an abundance of ingenious, dazzling mathematics (including proofs) instead.

The book introduces the general topic of machine learning “from the ground up,” which provides enough foundation for understanding the content that explains and elaborates boosting. Readers will only require experience with undergraduate probability, calculus, and linear algebra, so the target audience can include students and non-machine-learning specialists. The exercises at the end of each chapter are, I am sure, meant for students and researchers alike. The writing is efficiently expressive and clear, with the subject matter very well sequenced. I found no typographical errors. The authors season the text with well-timed and quite pithy humor: “How is it that a committee of blockheads can somehow arrive at highly reasoned decisions, despite the weak judgment of the individual members?” They answer their own question by claiming that “boosting solves hard machine-learning problems by forming a very smart committee [a collection of weak and inaccurate rules of thumb] of grossly incompetent but carefully selected members.” The generic set of algorithms, to which the authors also refer to as a “meta-algorithm,” is called AdaBoost, with Ada standing for “adaptive” and not for the computer language--a classification conundrum for search engines that I am quite certain AdaBoost can handle! (The authors also present a natural language parsing challenge to AdaBoost in Groucho Marx’s famous joke that “fruit flies like a banana,” which I, as a card-carrying spoiler, would a priori disambiguate via a hyphen: “fruit-flies.” This, of course, is not the point, and besides, hyphenation went out of style long ago.)

I chose to read the appendix of background mathematical information first, and I recommend it, especially to students. It is quintessentially helpful, precise, and accurate, and complete with respect to the content of the book. I most appreciated its rapid but effective buildup to such notions as compactness, convexity, and the central limit theorem. This appendix by itself could serve as an excellent review for the final exam in a class on the calculus of several variables.

Early in the book (p. 10), the authors provide an example of a strong classifier that results from the combination of three weak classifiers. This example is impressive and motivating, with training error dropping exponentially with the number of weak classifiers (a fact that is proved in the book). Since AdaBoost is, loosely speaking, a species of fitting theory to data, it is subject to the universal hazard of overfitting (via a plethora of parameters), with which many of us have had to deal in various scientific activities. The authors’ quantification of Occam’s razor in showing AdaBoost to be resistant to overfitting is in my opinion a tour de force. “Simplicity is not at all simple to define,” the authors note, but the tradeoff is between the simplicity of (the members of) the training set and the fit to the training data. Another tour de force is their demonstration (in chapter 5) that even after the training error is asymptotic in the number of training examples, the “margin” (that is, the confidence level) continues to increase with this number. Larger margins on the training set ensure a lower generalization error.

The essence of game theory is covered very well (chapter 6), and the AdaBoost connection is made with the observation that the boosting and weak learning algorithm is “playing a game repeatedly in a standard game-theoretic fashion.” And in the mind-reading game, the computer wins 87 percent of the games. Connections with, for example, logistic regression and convex optimization are made equally clear and compelling.

This excellent book is a mind-stretcher that should be read and reread, even by nonspecialists.

Reviewer:  George Hacken Review #: CR140778 (1303-0200)
Bookmark and Share
  Reviewer Selected
Featured Reviewer
 
 
Learning (I.2.6 )
 
 
Numerical Algorithms And Problems (F.2.1 )
 
Would you recommend this review?
yes
no
Other reviews under "Learning": Date
Learning in parallel networks: simulating learning in a probabilistic system
Hinton G. (ed) BYTE 10(4): 265-273, 1985. Type: Article
Nov 1 1985
Macro-operators: a weak method for learning
Korf R. Artificial Intelligence 26(1): 35-77, 1985. Type: Article
Feb 1 1986
Inferring (mal) rules from pupils’ protocols
Sleeman D.  Progress in artificial intelligence (, Orsay, France,391985. Type: Proceedings
Dec 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy