Many machine learning algorithms, such as neural nets, support vector machines, decision trees, Bayesian networks, clustering, hidden Markov models, and temporal difference learning, have been devised and used to discover patterns in data. While often effective, these algorithms usually capture only parts of the patterns in the data. A natural way to get more powerful systems is to combine several instances of these algorithms into a more complex system that is more effective. This ensemble approach to building more effective learnable classifiers is the subject of this book.
The most common approaches to combining classifiers into an ensemble involve adding up their ratings if numerical, or using various voting strategies if not; many other approaches are possible and are more suited to certain applications. The proper choice of the classifiers that get combined can be a major factor in building an ensemble that is significantly better than any of its components. This book presents these alternative ways of selecting classifiers, training them, and combining them into more effective systems.
The book itself is written by an ensemble of experts. Each of the 11 chapters is written by one or more authors, and each approaches the subject from a different direction. The first seven chapters present the theory of various ensemble approaches and the results of experiments testing how well they work. The remaining four chapters present some applications where ensemble learning has proven to be especially effective. While the math in this book is not difficult, the book does assume that the reader already knows about various machine learning algorithms, such as those mentioned above.
The theory section begins in chapter 1, which briefly introduces many ways to build ensemble learning systems. It emphasizes variations in how they choose training data, how they train the classifiers to be combined, and how the outputs of those classifiers combine to produce the final classification. The chapter also outlines how ensemble approaches can be used to deal with some difficult learning situations, such as incremental learning, data fusion, feature selection, learning from incomplete sets of data, and learning concepts that change over time. Chapter 2 covers a variety of boosting methods and relates them to bootstrapping and bagging. Chapter 3 applies boosting to kernel estimators. Like all boosting methods, these improve the accuracy of a single classifier (such as regression) by adding classifiers that focus on classifying more correctly the data that the previous classifiers misclassified. This notion of classifier refinement to build better estimators for targeted parameters is developed further in chapter 4. Random forests are covered in chapter 5. The classifiers in these forests resemble decision trees, but the candidate trees are constructed in a more random fashion than decision trees usually are. Chapter 6 presents several approaches that simultaneously train the component classifiers so that their errors are not very correlated. When component classifiers require unreasonably large matrices, the methods in chapter 7 approximate them with low rank matrices and produce ensembles that are more practical to work with.
The application section begins in chapter 8 by showing how ensemble machine learning methods can be used to greatly speed up the training of computer vision programs that detect faces and pedestrians. Boosting has been applied to the recognition of human activities in video signals. Chapter 9 reports successful recognition of activities such as boxing, hand waving, walking, yawning, checking voicemail on a phone, and kicking a ball. Learning to detect and segment anatomical structures in medical images is covered in chapter 10. This includes the interpretation of the 3D volume images that computer tomography produces. Finally, chapter 11 discusses the use of random forests in bioinformatic applications, such as the analysis of data from microarrays, mass spectrometers, and databases of protein-protein interactions or genome sequences.
This is an excellent book for someone who has already learned the basic machine learning tools. It would work well as a textbook or resource for a second course on machine learning. The algorithms are clearly presented in pseudocode form, and each chapter has its own references (about 50 on average).