Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Exponential moving average based multiagent reinforcement learning algorithms
Awheda M., Schwartz H. Artificial Intelligence Review45 (3):299-332,2016.Type:Article
Date Reviewed: Jun 15 2016

Reinforcement learning for multiagent systems aims to find optimal policies that can be learned by agents during their interaction in cooperative or competitive games. In game theory, the target is reaching the Nash equilibrium, where each agent is making the best possible decision, taking into account the decisions of the other agents.

In this paper, the multiagent team is making a competitive game; each agent has different goals, assumptions, and algorithms. How can multiagent learning algorithms allow agents to learn Nash equilibrium strategies? The authors answer with two algorithms for policy learning. Both of them “use the exponential moving average approach and the Q-learning algorithm ... to update the policy for the learning agent.”

The difference is in how to update the policy. The first algorithm is constant-learning-rate exponential-moving-average Q-learning (CLR-EMAQL), so it uses a constant rate. The second algorithm is exponential-moving-average-Q-learning (EMAQL); it employs two decay rates, guided by the competing mechanisms of learn-fast or learn-slow, which allow the agent to learn differently when its policy is winning or losing.

The paper extends algorithms already proposed by the authors. Here they demonstrate that CLR-EMAQL “converges to Nash equilibrium ... in games that have pure Nash equilibrium,” while EMAQL can work also in games that have mixed Nash equilibrium. The mathematical analysis is done on a simplified two-player, two-action game; it includes demonstrations.

A large part of the paper presents matrix and stochastic games to illustrate the two algorithms. An entire section reports detailed simulations of the proposed algorithms against other literature methods.

The paper is both a good introduction to multiagent policy learning and an effective presentation of new algorithms. It is useful for graduate students and researchers in this specific learning method, which is interesting for applications ranging from financial strategies to robotics teams. The only negative is that it does not discuss the problem of parameter setting in the method.

Reviewer:  G. Gini Review #: CR144502 (1609-0690)
Bookmark and Share
  Featured Reviewer  
 
Learning (I.2.6 )
 
 
Markov Processes (G.3 ... )
 
 
Multiagent Systems (I.2.11 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Learning": Date
Learning in parallel networks: simulating learning in a probabilistic system
Hinton G. (ed) BYTE 10(4): 265-273, 1985. Type: Article
Nov 1 1985
Macro-operators: a weak method for learning
Korf R. Artificial Intelligence 26(1): 35-77, 1985. Type: Article
Feb 1 1986
Inferring (mal) rules from pupils’ protocols
Sleeman D.  Progress in artificial intelligence (, Orsay, France,391985. Type: Proceedings
Dec 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy