Rarely does a book introduce a new field of study at the intersection of computer science, deep reinforcement learning (RL), probability theory, statistics, and applied math. In their work at Google DeepMind, the authors applied technologies like probability distributional RL to create, for example, AlphaGo and to discover new algorithms for matrix multiplication, in addition to more advancements that have expanded the boundaries of human knowledge and skills. They are members of the team that invented both the theories behind probability distributional RL and the methods to apply them to solve real problems.

However, this book is more than just a collection of excerpts from their research papers. The authors systematically build the theoretical background for probability distributional RL, with a sharp focus on the practical implementations of these algorithms on available hardware. This book would be equally useful for practitioners, researchers, and graduate students looking for thesis topics. For these reasons, *Distributional reinforcement learning* along with other classic books in the RL field [1] will remain relevant for many decades to come.

Classical RL is a collection of algorithms that use rewards to make software agents perform tasks in an environment that is not differentiable where backpropagation would be ineffective. This approach has been used in various domains, from autonomous driving and game play to aerial drone navigation and securities trading. Large language models (LLMs) like GPT-4 use reinforcement learning from human feedback (RLHF) to improve performance. In most cases, the rewards used are scalar. This book shows how rewards that are probability distributions can improve performance. Figure 10.4 shows how the new implicit quantile networks (IQN) algorithm performs better than the deep Q-learning (DQN) algorithm when it comes to playing Atari 2600 games.

Classical RL depends on fundamental concepts such as the Bellman equation, Markov decision processes, and Monte Carlo methods. These and other related algorithms rely on collections of scalar rewards. In contrast, probability distributional RL has to learn and operate (scaling, shifting, mixing, finding distances) on a collection of probability (return) distributions. In order to do this, the authors introduce the random-variable Bellman equation, the distributional Bellman equation, and different techniques to learn return distributions. The book includes extensive information about finding distances between probability distributions, the concepts of contractivity, and the distributional Bellman operator. Figures 2.5 and 2.6 elegantly depict a few related concepts.

The authors have adapted classical RL algorithms for the realm of probability distributional RL, providing step-by-step mathematical rigor throughout their book. This process gives a solid theoretical foundation for classical RL as well. The book also describes applications of probability distributional RL in improving multiagent RL and computational neuroscience.

Page 333 is a useful list of notations that is handy to follow the math. However, some of the symbols are missing, and readers may find it beneficial to make their own list as they become introduced to new concepts and notations. An index for the figures would also help readers follow the book’s outline. Additionally, although Figure 10.7 makes its point, a larger color image would make it more meaningful.

Most practitioners can make use of probability distributional RL by reading the first three chapters and, importantly, chapter 10 where probability distributional RL’s connection to applied DRL is established. Each chapter includes extensive technical and bibliographical remarks, as well as exercises. These sections also include in-depth surveys of recent work, making this book valuable for researchers and graduate students.