Computing Reviews, the leading online review service for computing literature.

Search

Exponential moving average based multiagent reinforcement learning algorithms
Awheda M., Schwartz H. Artificial Intelligence Review45 (3):299-332,2016.Type:Article

Date Reviewed: Jun 15 2016

Reinforcement learning for multiagent systems aims to find optimal policies that can be learned by agents during their interaction in cooperative or competitive games. In game theory, the target is reaching the Nash equilibrium, where each agent is making the best possible decision, taking into account the decisions of the other agents. In this paper, the multiagent team is making a competitive game; each agent has different goals, assumptions, and algorithms. How can multiagent learning algorithms allow agents to learn Nash equilibrium strategies? The authors answer with two algorithms for policy learning. Both of them “use the exponential moving average approach and the Q-learning algorithm ... to update the policy for the learning agent.” The difference is in how to update the policy. The first algorithm is constant-learning-rate exponential-moving-average Q-learning (CLR-EMAQL), so it uses a constant rate. The second algorithm is exponential-moving-average-Q-learning (EMAQL); it employs two decay rates, guided by the competing mechanisms of learn-fast or learn-slow, which allow the agent to learn differently when its policy is winning or losing. The paper extends algorithms already proposed by the authors. Here they demonstrate that CLR-EMAQL “converges to Nash equilibrium ... in games that have pure Nash equilibrium,” while EMAQL can work also in games that have mixed Nash equilibrium. The mathematical analysis is done on a simplified two-player, two-action game; it includes demonstrations. A large part of the paper presents matrix and stochastic games to illustrate the two algorithms. An entire section reports detailed simulations of the proposed algorithms against other literature methods. The paper is both a good introduction to multiagent policy learning and an effective presentation of new algorithms. It is useful for graduate students and researchers in this specific learning method, which is interesting for applications ranging from financial strategies to robotics teams. The only negative is that it does not discuss the problem of parameter setting in the method.

Reviewer: G. Gini	Review #: CR144502 (1609-0690)

Learning (I.2.6 )

Markov Processes (G.3 ... )

Multiagent Systems (I.2.11 ... )

Would you recommend this review?

yes

Other reviews under "Learning":	Date

Learning in parallel networks: simulating learning in a probabilistic system Hinton G. (ed) BYTE 10(4): 265-273, 1985. Type: Article	Nov 1 1985

Macro-operators: a weak method for learning Korf R. Artificial Intelligence 26(1): 35-77, 1985. Type: Article	Feb 1 1986

Inferring (mal) rules from pupils’ protocols Sleeman D. Progress in artificial intelligence (, Orsay, France,391985. Type: Proceedings	Dec 1 1985

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy