Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Probabilistic policy reuse for safe reinforcement learning
García J., Fernández F. ACM Transactions on Autonomous and Adaptive Systems13 (3):1-24,2019.Type:Article
Date Reviewed: Dec 17 2020

Human and robot planners seek safe and optimal action plans. Learning to adapt good examples, such as the ones provided by a teacher, is an effective way to speed up action planning and can be used to start reinforcement learning. When learning is continuously performed during real actions in a stochastic domain, the same action can lead the agent to different situations; so, repeating a plan does not work and exploration is necessary, with the connected risk of taking the agent to unseen and dangerous situations.

Learning during execution and optimizing the plan are indeed conflicting tasks. This paper proposes an algorithm to control how to interplay the two processes. A continuously increasing monotonic risk function, integrated into the policy reuse strategy, is parameterized to guarantee safety, avoiding unknown situations, or optimization, allowing more exploration. The results, discussed in a helicopter problem and in a competition among agents, show how parameter tuning can effectively modify the learning.

The paper is well written and easy to follow, the motivations and the resulting algorithm are analyzed, and the proposed solution is appealing. I am looking for more examples, especially in real life, where sensor readings add uncertainties and perhaps more tuning is necessary.

Reviewer:  G. Gini Review #: CR147139 (2104-0088)
Bookmark and Share
  Featured Reviewer  
 
Learning (I.2.6 )
 
 
General (F.0 )
 
 
General (I.0 )
 
Would you recommend this review?
yes
no
Other reviews under "Learning": Date
Learning in parallel networks: simulating learning in a probabilistic system
Hinton G. (ed) BYTE 10(4): 265-273, 1985. Type: Article
Nov 1 1985
Macro-operators: a weak method for learning
Korf R. Artificial Intelligence 26(1): 35-77, 1985. Type: Article
Feb 1 1986
Inferring (mal) rules from pupils’ protocols
Sleeman D.  Progress in artificial intelligence (, Orsay, France,391985. Type: Proceedings
Dec 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy