Computing Reviews, the leading online review service for computing literature.

Search

Handbook of learning and approximate dynamic programming (IEEE Press Series on Computational Intelligence)
Si J., Barto A., Powell W., Wunsch D., Wiley-IEEE Press, 2004. Type: Book (9780471660545)

Date Reviewed: Mar 18 2005

Dynamic programming (DP) refers to a collection of algorithms developed to solve sequential, multistage decision problems, or to determine optimal control strategies for nonlinear and stochastic dynamic systems. Until recently, DP was used to solve toy problems. Due to tremendous demand from complex application domains, there has been a spurt of research activity in several other disciplines, such as intelligent control, computational intelligence, and so on. Approximate dynamic programming (ADP) is a newly coined paradigm to represent the research community at large whose main focus is to find high-quality approximate solutions to problems for which exact solutions via classical dynamic programming are not attainable in practice, mainly due to computational complexities, and a lack of domain knowledge related to the problem. This book is an edited collection of 23 chapters, based on the 2002 National Science Foundation (NSF) Workshop on Approximate Dynamic Programming. The book is organized into three parts. In the introductory chapter, Werbos provides an excellent roadmap of the field, clearly identifying the relevant mathematical principles and theoretical background, and presenting some pointers and future challenging opportunities. Part 1 (chapters 2 through 8) presents an overview of the various ADP paradigms, and discusses how some of these techniques could be used for practical problem solving. Barto and Dietterich, in chapter 2, narrate the relationship between reinforcement learning and supervised learning from an ADP perspective. In chapter 3, Ferrari and Stengel provide an overview of model-based adaptive critic designs, emphasizing mathematical background and various ADP designs. In the next chapter, Lendaris and Neidhoffer provide ample guidance to the reader interested in adaptive critics for control. Si et al., in chapter 5, introduce the direct neural dynamic programming approach, which is an online learning control paradigm. In chapter 6, De Farias addresses the issue of the curse of dimensionality by treating ADP as a dual of the linear programming problem. In chapter 7, Grudic and Ungar tackle the curse of dimensionality by using the policy gradient reinforcement learning framework. Ryan, in chapter 8, addresses the use of a semi-Markov decision process model, and the development of hierarchical reinforcement learning. Part 2 (chapters 9 through 15) covers recent research results, and presents some pointers toward important discoveries in the future. Bertsekas et al., in chapter 9, present a first iterative temporal difference method that converges without a diminishing step size. In chapter 10, Warren and Powel present an ADP model for higher dimensional resource allocation problems. Mahadevan et al., in chapter 11, present a hierarchical probabilistic model for decision making involving concurrent action, multiagent coordination, and hidden state estimation in stochastic environments. In chapter 12, Cao discusses the learning and optimization of stochastic systems from a system theoretic perspective. Anderson et al., in chapter 13, present a hybrid combination of robust control and reinforcement learning. Supervised actor-critic reinforcement learning is presented by Rosenstein et al. in chapter 14. In chapter 15, Prokhorov presents back propagation through time and derivative adaptive critics, to determine the derivatives for training parameters in recurrent neural networks. Part 3 (chapters 16 through 23) is focused on the application of ADP to several real-world applications, involving large and complex problems, in an attempt to provide some insights for selecting suitable paradigms for solving future problems. Esogbue and Hearnes, in chapter 16, present a learning scheme that uses reinforcement learning to act in a near-optimal manner for problems that either have no model, or have a model that is very complex throughout. In chapter 17, Kang and Bien present a hierarchical reinforcement learning scheme for solving decision making problems in which more than one goal must be fulfilled. Balakrishnan and Han, in chapter 18, introduce an adaptive critic-based neural network, to steer an agile missile with bounds on the angle of attack. In chapter 19, Venayagamoorthy et al. propose a straightforward application of adaptive critic design for power system control. Anderson et al., in chapter 20, present a case study that applies a robust reinforcement learning method to the heating, ventilation, and air conditioning control of buildings. In chapter 21, Enns and Si describe helicopter flight control using direct neural dynamic programming, while Momoh addresses optimal power flow tools in chapter 22. In the final chapter, Momoh and Zivi present several challenging and benchmark problems relevant to power systems. The chapters are well organized, and the text includes a discussion of current developments and future challenges, with most of the contents very well explained. Readers will not need to consult many additional references to understand the material. I was particularly impressed to read about the ADP success stories featured in chapters 16 through 23. This is highly encouraging, since one of the key motives for coining the ADP paradigm was to address large-scale problems that are important in our society, where we are challenged with complex problems. This edited volume provides an overview of the different aspects of ADP, its technical aspects, and successful applications. I am sure that anyone interested in ADP would enjoy this book. I only have one comment on the organization of the volume: from a reader’s point of view, the book would have been more appealing if all the related chapters were bundled together, instead of being presented under three different sections. I recommend this book for engineers, scientists, and practitioners who would like to have a state-of-the-art research overview in the ADP area. Finally, I would like to congratulate the editors, for putting together this wonderful collection of research contributions.

Reviewer: Ajith Abraham	Review #: CR131003

Dynamic Programming (I.2.8 ... )

Control Theory (I.2.8 ... )

Optimization (G.1.6 )

Would you recommend this review?

yes

Other reviews under "Dynamic Programming":	Date

Optimum decision trees--an optimal variable theorem and its related applications Miyakawa M. Acta Informatica 22(5): 475-498, 1985. Type: Article	Mar 1 1987

An efficient algorithm for optimal pruning of decision trees Almuallim H. Artificial Intelligence 83(2): 347-362, 1996. Type: Article	Apr 1 1997

Visual unrolling of network evolution and the analysis of dynamic discourse Brandes U., Corman S. Information Visualization 2(1): 40-50, 2003. Type: Article	Dec 30 2003

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy