Although machine learning is one of the newer major scientific domains, a tremendous number of papers have already been published, reporting progress in both theoretical research and practical developments. We have also seen a series of outstanding books bringing together the cumulative knowledge and offering unitary views on the relationships among the different topics.

This new book presents a comprehensive and mathematically sound account of some of the most significant sub-fields of machine learning.

The book has 14 chapters and four appendices. Following an introductory chapter outlining the content of the book and the basic definitions and terminology, the second chapter presents the fundamentals of the probably approximately correct (PAC) learning framework. The results given in Sections 2.2 and 2.3 supply general learning guarantees for both consistent and inconsistent cases when the hypothesis sets are finite. The third chapter focuses on complexity analysis of the family of hypotheses expressed in terms of Rademacher complexity and Vapnik-Chervonenkis dimension (VC-dimension), and establishes several upper and lower bounds on the generalization error in Sections 3.3 and 3.4.

Support vector machines (SVMs) are used to develop pattern classification and function approximation systems. Chapter 4 provides a theoretical foundation for SVMs based on the notion of margin, and discusses variants of classification algorithms for both separable and nonseparable samples. The “kernel trick,” one of the most used methods for solving hard classification tasks when the datasets are not linearly separable, is based on the use of kernels that project the data into a higher dimensional space of features in the hope that, in this space, the sample becomes linearly separable without increasing the computational complexity. Chapter 5 presents the fundamentals of Mercer kernels and the extension of SVMs in terms of kernels. The concept of a rational kernel is introduced in Section 5.5, together with an efficient algorithm for the computation of these kernels.

Ensemble methods, also known as boosting, are based on weighing several individual pattern classifiers, and combining their decisions to obtain a classifier that outperforms every one of them. In chapter 6, the authors discuss the boosting method and provide a theoretical analysis of its generalization capacities in terms of the VC-dimension of the hypothesis set. This method processes one sample at a time and is relatively easy to implement, so the online learning algorithms prove very attractive for solving classification tasks. The most currently used online learning algorithms are the subject of chapter 7, including weighted majority and its randomized variant, exponential weighted average, the perceptron and winnow algorithms, and their connection with game theory. The multi-class classification problem is examined in the next chapter, where the main uncombined and aggregated classes of algorithms are presented and analyzed from the point of view of their corresponding efficiency.

One of the problems that arises, especially in the case of online algorithms, concerns the order in which the individual samples from the datasets are tested. Chapter 9 supplies a detailed analysis of the learning problem of ranking. In chapter 10, the authors focus on regression methods, with detailed descriptions of different types of regressive techniques, such as linear regression, kernel ridge regression, and support vector regression.

The next chapter discusses a series of theoretical results concerning the derivation of bounds for the generalization error in terms of their corresponding stability properties. Following the presentation and theoretical analysis of the most frequently used dimensionality reduction methods, such as principal component analysis (PCA) and kernel principal component analysis (KPCA), the authors cover the problem of learning languages in some detail in chapter 13.

Reinforcement learning is viewed as filling the gap between supervised and unsupervised learning, by trying out different strategies and identifying the ones that work best. The final chapter of the book is devoted to the theoretical foundation of reinforcement learning.

In my opinion, the content of the book is outstanding in terms of clarity of discourse and the variety of well-selected examples and exercises. The enlightening comments provided by the author at the end of each chapter and the suggestions for further reading are also important features of the book. The concepts and methods are presented in a very clear and accessible way and the illustrative examples contribute substantially to facilitating the understanding of the overall work.

The book is suitable for advanced courses on machine learning, computer science, data mining, statistical pattern recognition, and bioinformatics, as well for researchers working in domains related to machine learning.