Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Bandit algorithms
Lattimore T., Szepesvári C., Cambridge University Press, Cambridge, UK, 2020. 518 pp. Type: Book (978-1-108486-82-8)
Date Reviewed: May 3 2021

This book is on bandit algorithms. The word “bandit” is commonly understood to mean an armed thief who is usually a member of a group or band. However, in the context of this book, bandit refers to a type of slot machine with a large metal pole or handle on the side that should be pulled to make it work. This kind of a machine is commonly known as a one-armed bandit and is used for gambling in casinos. The focus of this book is on algorithms for bandits with one or more arms. The multi-armed bandit model is often used to address the problem of decision-making in the face of uncertainty in machine learning. Familiarity with the basics of analysis, calculus, and linear algebra is expected from readers.

This book consists of eight parts and 38 chapters. The first part, “Bandits, Probability, and Concentration,” has five chapters. After the introductory chapter, the reader is familiarized with some notions from probability, stochastic processes, and Markov chains. Stochastic bandits are then discussed, followed by concentration of measure.

The second part, “Stochastic Bandits with Finitely Many Arms,” has five chapters. Algorithms such as the explore-then-commit algorithm and the upper confidence bound algorithm are described. The asymptotic optimality and the minimax optimality of the upper confidence bound algorithm are studied. Bernoulli noise in relation to the upper confidence bound algorithm is also explored.

The third part is “Adversarial Bandits with Finitely Many Arms.” This part has just two chapters, which deal with the Exp3 algorithm and the Exp3-IX algorithm.

The fourth part, “Lower Bounds for Bandits with Finitely Many Arms,” has five chapters. Basic ideas related to lower bounds are introduced, followed by foundations from information theory. Various types of lower bounds are studied, including minimax lower bounds, instance-dependent lower bounds, and high-probability lower bounds.

The fifth part is “Contextual and Linear Bandits.” The eight chapters in this part discuss contextual bandits, stochastic linear bandits, confidence bounds for least square estimators, optimal design for least square estimators, stochastic linear bandits with finitely many arms, stochastic linear bandits with sparseness, minimax lower bounds for stochastic linear bandits, and asymptotic lower bounds for stochastic linear bandits.

The sixth part, “Adversarial Linear Bandits,” has four chapters. Foundations from convex analysis are presented, followed by the Exp3 algorithm for adversarial linear bandits. Follow-the-regularized-leader and mirror descent are described, and the relation between adversarial and stochastic linear bandits is discussed.

Part 7’s seven chapters focus on miscellaneous topics: combinatorial bandits, non-stationary bandits, ranking, pure exploration, some foundations from Bayesian learning, Bayesian bandits, and Thompson sampling.

Part 8, “Beyond Bandits,” has just two chapters: one on partial monitoring and another on Markov decision processes.

Although many books on topics related to bandit algorithms exist, this book is one of the most elaborate, with a focus on many related topics. Chapters or topics that may be skipped are marked with a kangaroo symbol. Notes, bibliographic remarks, and exercises at the end of chapters significantly enhance the utility of this book for pedagogy as well as research. The authors claim that this is not a traditional mathematics book; however, it does include many lemmas, theorems, and proofs that help bring mathematical rigor. The book, though mathematically heavy, contains helpful notation that warns the reader, though it helps to know about experiments, important facts, and notes. A freely downloadable version of the book further increases its utility. In the introductory chapter, the authors mention that they can’t point to an example where bandits have actually been used in clinical trials. However, there is substantial literature on the use of bandits for clinical trials, as well as barriers to their use. This well-written book has scores of citations already. The bibliography and index are helpful. I highly recommend this book for all those interested in bandit algorithms, including academics, practitioners, students, and researchers.

Reviewer:  S. V. Nagaraj Review #: CR147256 (2108-0200)
Bookmark and Share
  Reviewer Selected
Featured Reviewer
 
 
Learning (I.2.6 )
 
 
General (F.2.0 )
 
Would you recommend this review?
yes
no
Other reviews under "Learning": Date
Learning in parallel networks: simulating learning in a probabilistic system
Hinton G. (ed) BYTE 10(4): 265-273, 1985. Type: Article
Nov 1 1985
Macro-operators: a weak method for learning
Korf R. Artificial Intelligence 26(1): 35-77, 1985. Type: Article
Feb 1 1986
Inferring (mal) rules from pupils’ protocols
Sleeman D.  Progress in artificial intelligence (, Orsay, France,391985. Type: Proceedings
Dec 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy