Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Applied machine learning
Forsyth D., Springer International Publishing, New York, NY, 2019. 494 pp. Type: Book (978-3-030181-13-0)
Date Reviewed: Jul 17 2020

Machine learning methods find application in almost any domain that makes use of some form of computation. These algorithms build appropriate models that help in making predictions; thus, they are of interest in industry as well as in business scenarios.

This book is compiled from the author’s course notes and will be very useful as an undergraduate textbook. The focus here is on describing the necessary techniques for machine learning tools, rather than an in-depth look at the underlying theory. The content is split into six sections that cover classification, high-dimensional data, clustering, regression, graphical models, and deep networks.

The basic classification step consists of identifying the label of an item based on a set of features. The first section provides a good account of the Bayes classifier and the support vector machine. Training error “is the error rate on examples used to train the classifier,” and the test error is the error on other examples. “A small gap between training error and test error is [desirable],” so bounds on the probability that the gap can become high are studied.

When items in the data contain multiple features, the dataset gets modeled as d-dimensional vectors. The behavior of such high-dimensional data is nonintuitive, and the second section handles this phenomenon known as “the curse of dimension” (which is exhibited by points being close to the decision boundary and also farther apart). Techniques that consider the dataset as a collection of blobs, or clusters, where points that are closer together belong to the same blob, are demonstrated. In most high-dimensional datasets, “diagonal entries of the covariance matrix are very small,” so techniques that build a reasonably accurate lower-dimensional model using principal component analysis (PCA) are shown.

Clustering is covered in Part 3. Many algorithms are discussed here. In the agglomerative method, each data item is considered as a cluster, and then clusters are merged recursively to arrive at a good clustering. In the divisive method, the entire dataset is considered as a cluster, and then clusters are split recursively to find a good clustering. The iterative k-means method is described as the go-to clustering algorithm, along with a more general expectation maximization (EM) algorithm that can handle cases where some data is missing.

Whereas classification “predict[s] a class from a data item,” regression models predict a value and are useful in comparing trends in the data. This is the topic of Part 4, which describes regression using linear models, along with transformations to improve performance. To find sets of independent variables that predict effectively, greedy search methods are described. Forward stagewise regression adds variables, whereas the backward method removes them until the change makes regression worse. Boosting is another greedy technique where an optimal predictor is built incrementally from less ambitious predictors.

Part 5 describes models that can be represented as graphs, for example, hidden Markov models (HMM) and techniques to learn them. Conditional random field (CRF) is another model that considers joint probabilities. A learning algorithm for CRF “repeatedly computes the current best inferred set of hidden states” and “adjusts the cost functions so that the desired sequence scores better than the current best.”

A neural network consists of a stack of layers of units, where each unit “accepts a set of inputs and a set of parameters, and produces a number [that] is a nonlinear function of the inputs and the parameters.” When the number of layers is large, the network is called a “deep network.” Such networks make excellent classifiers and are the topic of the last section. “Networks are trained by descent on a loss [function],” and backpropagation is used for evaluating the gradient. To improve training, a gradient scaling method is described. Image classification is the interesting centerpiece here. Another application shown here is an encoder that produces low-dimensional code from higher-dimensional data, which is trained along with a decoder that recovers data from the code.

Overall, this is a great textbook that will be very useful for students and practitioners. The material is designed to be read from beginning to end. Basic skills in linear algebra, probability, statistics, and programming are expected of readers. Each chapter is organized very well, with clear descriptions and examples. Key ideas are highlighted in the form of text boxes that offer an excellent summary of the main body.

Reviewer:  Paparao Kavalipati Review #: CR147018 (2012-0282)
Bookmark and Share
  Reviewer Selected
Featured Reviewer
 
 
Learning (I.2.6 )
 
 
Paradigm (C.1.2 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Learning": Date
Learning in parallel networks: simulating learning in a probabilistic system
Hinton G. (ed) BYTE 10(4): 265-273, 1985. Type: Article
Nov 1 1985
Macro-operators: a weak method for learning
Korf R. Artificial Intelligence 26(1): 35-77, 1985. Type: Article
Feb 1 1986
Inferring (mal) rules from pupils’ protocols
Sleeman D.  Progress in artificial intelligence (, Orsay, France,391985. Type: Proceedings
Dec 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy