ComputingReviews.com

Probabilistic policy reuse for safe reinforcement learning
García J., Fernández F. ACM Transactions on Autonomous and Adaptive Systems13(3):1-24,2019.Type:Article

Date Reviewed: 12/17/20

Human and robot planners seek safe and optimal action plans. Learning to adapt good examples, such as the ones provided by a teacher, is an effective way to speed up action planning and can be used to start reinforcement learning. When learning is continuously performed during real actions in a stochastic domain, the same action can lead the agent to different situations; so, repeating a plan does not work and exploration is necessary, with the connected risk of taking the agent to unseen and dangerous situations.

Learning during execution and optimizing the plan are indeed conflicting tasks. This paper proposes an algorithm to control how to interplay the two processes. A continuously increasing monotonic risk function, integrated into the policy reuse strategy, is parameterized to guarantee safety, avoiding unknown situations, or optimization, allowing more exploration. The results, discussed in a helicopter problem and in a competition among agents, show how parameter tuning can effectively modify the learning.

The paper is well written and easy to follow, the motivations and the resulting algorithm are analyzed, and the proposed solution is appealing. I am looking for more examples, especially in real life, where sensor readings add uncertainties and perhaps more tuning is necessary.

Reviewer: G. Gini

Review #: CR147139 (2104-0088)

Reproduction in whole or in part without permission is prohibited. Copyright 2024 ComputingReviews.com™
Terms of Use | Privacy Policy