Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Fundamentals of machine learning for predictive data analytics : algorithms, worked examples, and case studies
Kelleher J., Mac Namee B., D’Arcy A., The MIT Press, Cambridge, MA, 2015. 624 pp. Type: Book (978-0-262029-44-5)
Date Reviewed: Jan 27 2016

Said to be among the most exciting fields in tech, machine learning is a trending topic right now. Machine learning has seen massive innovation and growth in the past decade and has become so pervasive that most of us use it daily without noticing for tasks ranging from searching the Web to using speech recognition on a smartphone. It is already making its way into a broad range of industries, from marketing to healthcare to infrastructure to fraud management.

For that reason, there is no shortage of books and courses on machine learning. One of the conspicuous trends is to introduce the subject informally, so it can reach a broader audience. The authors follow their own principle, announced early: introduce the most powerful and popular algorithms and techniques, albeit informally, and provide complete and working examples. This principle immediately defines the audience (undergraduates in computer science, graduates in adjacent disciplines, and practitioners), and that approach would work well in certain settings, especially when a clear link between the informal exposé and associated real-life problems is fully established. The authors pay attention to this and thoroughly explain how the book is structured and how a possible course based on the book can be organized. Yet one should be aware that there will be some conspicuous drawbacks on the way.

For instance, in the first chapter, “Machine Learning for Predictive Analytics,” a few notions of utmost importance are introduced: underfitting, overfitting, and an optimal model. However, most glaringly, the authors never discuss the fundamental impedance between bias and variance, and the bias-variance trade-off. Thus, the notion of an optimal model is left somewhat hanging up in the air; it is just difficult for the reader to grasp it fully, since it is only informally stated. The optimal model lies between less and more smoothing, in the intersection of the squared bias curve going up and the variance curve going down, as more smoothing is applied. That is the model that minimizes the risk = bias2 + variance; it is all about balancing bias and variance. (This leaves out more complex versions of the bias-complexity trade-off and absence of a universal learner. It is briefly mentioned in chapter 11, which discusses the no free lunch theorem, but this is only in the concluding sections of the book.) The next logical step would then be talking about loss and its representation, since the risk is affected by noise. These two notions are fundamental to machine learning (it is impossible to have even an introductory book on the fundamentals of machine learning that does not cover them to some degree).

What’s more, it is probably a missed opportunity to cover the paradigm within the context of the cross-industry process for data mining (CRISP-DM) life cycle; the very same context is discussed for overfitting and underfitting in the book.

One of the samples in the book (the promised complete example) is on motor insurance fraud. This is a nice fit for the purpose of the book; the reader is able to trace the case gradually, starting from descriptive statistics and getting to the analyses with more and more sophistication. The exercises that accompany the main text are useful and relevant. Another trick employed by the authors, “big idea,” is an informal explanation of algorithm fundamentals. It works well in chapter 4, “Information-based Learning,” and on, starting from ID3.

In chapter 5, “Similarity-based Learning,” in the context of feature selection, the authors introduce the “curse of dimensionality,” which is an important phenomenon in machine learning. It is an object of recent studies because it presents major technical difficulties in scaling. I was surprised to see that the most obvious manifestations of this phenomenon (such as two randomly selected vectors tend to be orthogonal, or two randomly selected points tend to be far apart (close to average)) are not mentioned; instead, the illustrations focused on hypercube density. Naive Bayesian models are introduced in chapter 6. More abstract context finally arrives in chapter 7, “Error-based Learning,” where regression, logistic regression, and multivariate/multinomial logistic regression are addressed. The more formal and condensed chapter 8 discusses the receiver operating characteristic (ROC) and useful associated metrics.

Chapter 8, “Evaluation,” contains a few case studies that follow the CRISP-DM; however, the transition from descriptive statistics to modeling is not smooth. There was no clear explanation given (section 9.4, “Modeling”).

Chapter 11, “The Art of Machine Learning for Predictive Data Analytics,” contains direct recommendations on how to proceed further in a vast area of machine learning. I felt like a lot more could have been said here, but it is more a matter of taste than essence.

If students are interested in a practical and holistic approach to solving a complex problem, without worrying too much about consistency and potential quantitative limitations while adopting a common (mostly qualitative) understanding, this is a useful course and, ultimately, it is a good approach for the purpose. This book certainly delivers on its promise and principles.

Nonetheless, it is up to readers to decide whether to adopt the approach in the first place. In my opinion, the book is best suited to a course that complements more rigorous expositions of machine learning fundamentals, especially those gravitating toward general data science.

More reviews about this item: Amazon

Reviewer:  Serge Berger Review #: CR144133 (1605-0298)
Bookmark and Share
  Reviewer Selected
 
 
Learning (I.2.6 )
 
 
Data Mining (H.2.8 ... )
 
 
Content Analysis And Indexing (H.3.1 )
 
 
Pattern Recognition (I.5 )
 
Would you recommend this review?
yes
no
Other reviews under "Learning": Date
Learning in parallel networks: simulating learning in a probabilistic system
Hinton G. (ed) BYTE 10(4): 265-273, 1985. Type: Article
Nov 1 1985
Macro-operators: a weak method for learning
Korf R. Artificial Intelligence 26(1): 35-77, 1985. Type: Article
Feb 1 1986
Inferring (mal) rules from pupils’ protocols
Sleeman D.  Progress in artificial intelligence (, Orsay, France,391985. Type: Proceedings
Dec 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy