Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Variational Bayesian learning theory
Nakajima S., Watanabe K., Sugiyama M., Cambridge University Press, New York, NY, 2019. 558 pp. Type: Book (978-1-107076-15-0)
Date Reviewed: Sep 10 2020

In machine learning, variational Bayesian (VB) learning is one of the most popular methods, according to the back cover of the book. The VB learning framework poses and solves optimization problems. The book explains the optimization of target distributions of unknown variables in Bayesian learning. It first covers the basics of Bayesian and VB learning. Then it gives algorithms for various classes of problems. The optimization algorithms refer to equations discussed in the book. Later, the book describes various nonasymptotic and asymptotic theories.

Motivation for VB learning theory can be found in the preface. The first part of the book gives the basics of VB. It develops concepts of a posteriori learning/distributions, mixture models with latent variables, free energy, and so on. The basics include analysis of some popular distributions as per the theory. In the end, the empirical methods are introduced. Part 1 is a good introduction to the Bayesian and VB learning frameworks. These include finite mixture models such as the Gaussian mixture model and exponential family mixture models.

Part 2 is comprised of chapters 3 through 5. Chapter 3 has empirical VB (EVB) algorithms for matrix factorization, matrix factorization with missing entries, tensor Tucker factorization, low-rank subspace clustering, and sparse additive matrix factorization. Chapter 4 gives EVB algorithms for latent variables models. These include finite mixture models such as the Gaussian mixture model and exponential family mixture models. The chapter also gives EVB algorithms for Bayesian networks, hidden Markov models, probabilistic context-free grammars, and latent Dirichlet allocation. Chapter 5 discusses EVB algorithms for models with no conjugacy.

Part 3 discusses factorization of fully observed matrices. It gives VB and EVB algorithms for such factorization. It develops a VB learning theory for the problem and provides a solution for it. The theory includes necessary theorems and their proofs. This part has a full chapter devoted to performance analysis of the solution. Specialized versions for matrix factorization with missing points, low-rank subspace clustering, and sparse additive matrix factorization are described. The last chapter in this part describes solutions from Bayesian variants of other learning methods and then compares them with the VB solution.

Part 4 of the book is on asymptotic theory of VB learning. The first chapter introduces asymptotic theory for statistical machine learning. The major concerns are limits on learning samples and generalization errors. The rest of the chapters focus on asymptotic analysis of VB learning in latent variable models. Particular models discussed are reduced rank regression and models from chapter 4. This part concludes with a chapter on unified theory for latent variable models. The chapter proves an inequality involving generalization error and VB free energy for latent variable models.

Due to the theoretical nature of the book, it may be more appropriate for an advanced machine learning course at the graduate level. Readers are expected to have a background in statistics, probability theory, optimization algorithms, and machine learning. Machine learning professionals can get ready-made algorithms for a large number of models.

Reviewer:  Maulik A. Dave Review #: CR147058 (2102-0028)
Bookmark and Share
  Reviewer Selected
Featured Reviewer
 
 
Learning (I.2.6 )
 
 
Graphical Environments (D.2.6 ... )
 
 
Artificial Intelligence (I.2 )
 
 
General (F.0 )
 
 
General (I.0 )
 
Would you recommend this review?
yes
no
Other reviews under "Learning": Date
Learning in parallel networks: simulating learning in a probabilistic system
Hinton G. (ed) BYTE 10(4): 265-273, 1985. Type: Article
Nov 1 1985
Macro-operators: a weak method for learning
Korf R. Artificial Intelligence 26(1): 35-77, 1985. Type: Article
Feb 1 1986
Inferring (mal) rules from pupils’ protocols
Sleeman D.  Progress in artificial intelligence (, Orsay, France,391985. Type: Proceedings
Dec 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy