Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Modeling incubation and restructuring for creative problem solving in robots
Kralik J., Mao T., Cheng Z., Ray L. Robotics and Autonomous Systems86 162-173,2016.Type:Article
Date Reviewed: Jun 28 2017

Think of this experiment: a monkey is faced with four quantities of food versus a single quantity, and it is given the four quantities as a reward only when it picks the single quantity. This experiment defies the common sense of an instinctive behavior (that is, animals have the tendency to pick a large quantity of food by default). Thus, it is called the reverse-reward problem. The authors of this paper provide a simulation of this problem using mobile robots.

Through experience, monkeys solve the reverse-reward problem in three stages: the first is “impasse” where they learn to inhibit the selection of large quantities. The second is “incubation” where they hover between the two options of food quantities to end up choosing one by chance. The third stage reveals “insight” where they learn to pick the option with a smaller quantity so they receive the larger one. After many repetitions, the monkeys build a spontaneous generalization of the situation. In other words, they learn an abstract representation of the objects (that is, the quantities of food). For instance, if they are faced with five quantities versus ten, then they pick five to receive the big reward of ten. These stages are the fundamental blocks of the process of creative problem solving.

In the computational world, the monkey is considered as an agent that fulfills a Markov decision process (MDP). This means that the agent has a current state (that is, the setup of food quantities it is facing), takes an action toward a food slot (that is, sets its future state), has a probability of transition from the current state to a future state, and has a reward function that depends on the action taken and the current state. To achieve learning within the agent’s MDP, the authors implement Q-learning, which is a reinforcement learning technique similar to temporal difference (TD) learning. TD learning aims to minimize the error (that is, the difference) between the current action value and subsequent reward; however, in Q-learning, the subsequent reward is “estimated as the sum of the actual reward received immediately after taking the action and an estimate of future rewards received by subsequent actions taken from the new state.” Evidently, a mathematical formula of Q-learning makes things clearer. The formula is available in the paper, and I encourage you to take a look at it in order to reflect on the beauty of mathematics. Furthermore, the authors extend Q-learning within MDP by implementing a hierarchical tree of the agent’s possible states. This attempt helps in eliminating state configurations that are irrelevant to the problem, thus reducing the complexity of the input (that is, the curse of dimensionality). They call this extension the Q-tree learning algorithm.

In this paper, the authors show us a stunning exploration of their Q-tree learning algorithm by implementing it in mobile robots (Pioneer P3-AT) equipped with camera vision systems. Their robotic experimentation simulates empirical data and achieves a high degree of satisfaction. I was lucky to pick up this paper and read it; at first, its title caught my attention. This paper is very straightforward and easy to read, and you can gain much knowledge by exploring it. The work is unique in its theoretical foundation, which makes it remarkable when compared to other studies in the literature.

Reviewer:  Mario Antoine Aoun Review #: CR145391 (1710-0685)
Bookmark and Share
  Featured Reviewer  
 
Robotics (I.2.9 )
 
Would you recommend this review?
yes
no
Other reviews under "Robotics": Date
Movement problems for 2-dimensional linkages
Hopcroft J. (ed), Joseph D., Whitesides S. SIAM Journal on Computing 13(3): 610-629, 1984. Type: Article
Feb 1 1985
Robot motion planning with uncertainty in control and sensing
Latombe J. (ed), Lazanas A., Shekhar S. Artificial Intelligence 52(1): 1-47, 1991. Type: Article
Oct 1 1992
Dictionary of robot technology in four languages: English, German, French, Russian
Bürger E., Korzak G., Elsevier North-Holland, Inc., New York, NY, 1986. Type: Book (9789780444995193)
Mar 1 1988
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy