Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Optimization for machine learning
Sra S., Nowozin S., Wright S., The MIT Press, Cambridge, MA, 2011. 512 pp. Type: Book (978-0-262016-46-9)
Date Reviewed: Apr 9 2012

Machine learning is a natural outgrowth of the intersection of computer science, statistics, mathematics, and engineering, its multidisciplinary nature being underscored by a long series of applications in a wide class of areas from finance to biology, physics, and chemistry. The developments in the area of machine learning have become increasingly complex, requiring the use of sophisticated mathematical tools to ensure both the accuracy and efficiency of the resulting solutions. This book provides a sound, rigorous, and comprehensive presentation of the fundamental optimization techniques for machine learning tasks.

The book is structured into 18 chapters, each written by an outstanding scientist. Chapter 1 supplies the main guidelines of optimization and machine learning and a brief overview of the book’s content. The next two chapters focus on typical optimization problems related to sparse methods as proximal methods, coordinate descent, working set methods, and interior-point algorithms for conic optimization. Chapter 4 surveys incremental methods currently used for convex optimization, together with the analysis of the corresponding convergence properties. The next two chapters provide a detailed presentation of the fundamentals of large-scale optimization for nonsmooth convex problems. The following two chapters present cutting-plane methods for machine learning and the dual decomposition method for inference in the framework of some typical problems that arise from Markov random fields and structured prediction problems.

Chapter 9 is devoted to the analysis of the potential of the class of augmented Lagrangian methods in solving machine learning tasks. The authors present the dual augmented Lagrangian algorithm for sparse estimation problems and analyze its relationships with some operator-splitting algorithms. The use of convex optimization techniques for regret minimization and projected Newton-type methods for large-scale optimization tasks is presented in the following two chapters. Chapter 12 describes the main features of interior-point methods for linear and convex quadratic programming in machine learning. The dependency of the generalization properties of large-scale learning systems on both the statistical properties of the objective function and the computational features of the optimization algorithm is pointed out by an asymptotical analysis on gradient algorithms, which is presented in the next chapter.

Robust optimization is a topic of increasing importance for machine learning purposes. Chapter 14 explores the potential of this paradigm to make the optimization-based learning algorithms robust to noise and for constructing algorithms with special properties (for instance, sparsity and consistency and generalization capacities).

Chapter 15 discusses improving first- and second-order methods by modeling uncertainty, while chapter 16 describes algorithms for optimizing functions over finite sets where the function value is only stochastically observable. Chapter 17 concerns the task of learning the structure of a Markov random field over Gaussian distributed random variables. The authors present coordinate descent methods for sparse inverse covariance selection and the variable splitting-based technique referred to as alternating linearization. The final chapter of the book presents an efficient algorithm for covariance selection.

Each chapter provides suggestions for further reading and a list of bibliographic references citing the most representative published works related to the respective topic. This book is of crucial value for scientists involved in theoretical research and in practical developments in various domains of machine learning and related areas.

Reviewer:  L. State Review #: CR140046 (1209-0903)
Bookmark and Share
  Reviewer Selected
 
 
Learning (I.2.6 )
 
 
Numerical Algorithms (G.1.0 ... )
 
 
General (G.1.0 )
 
 
Optimization (G.1.6 )
 
Would you recommend this review?
yes
no
Other reviews under "Learning": Date
Learning in parallel networks: simulating learning in a probabilistic system
Hinton G. (ed) BYTE 10(4): 265-273, 1985. Type: Article
Nov 1 1985
Macro-operators: a weak method for learning
Korf R. Artificial Intelligence 26(1): 35-77, 1985. Type: Article
Feb 1 1986
Inferring (mal) rules from pupils’ protocols
Sleeman D.  Progress in artificial intelligence (, Orsay, France,391985. Type: Proceedings
Dec 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy