This is an interesting contribution to social intelligence design. It demonstrates the extraordinary levels of complexity at which automated systems can successfully mimic, interpret, and “act upon” human behavior. This work showcases how communicatively relevant nonverbal elements ensure plausible and accurate interaction between robots and humans.
The main objective of the reported research is to obtain a means of communicating intentions between humans and artifacts (robots). This will achieve fluent--burdenless, for humans--communication between them. The paper has four sections: the first presents the theoretical premises of the work; the second presents the application of the theoretical premises in a system architecture design; the third describes the implementation of a listener robot following the described approach; and the fourth presents the performance results of the working artifact in a real situation of communication with humans.
The concepts of involvement, engagement, joint attention, focus, and shared cognitive spaces connected by mutual activities as reference points underlie the communicative model of the designed artifact. Four modalities of attention behaviors (hand movement, eye gaze, posture, and speech) enable social skills--such as no interruption, prompting, explanation, response, voice pitch, voice speed, and emotion--which help establish the advanced communicative reality.
A simple two-layered architecture (split into a coordinator of the robot’s behaviors and a detector of events in the environment) and four modules (an attention recognition module, a communication mode estimation module, immediate response generation, and a robot behavior decision module) use dynamic Bayesian networks, and provide the robot with the ability to infer a human user’s intention and environmental situation; then, it can determine its next operation.
The described method is illustrated in the implementation of a listener robot that can communicate in four interaction modes (talking, talking-to, explaining, and working), and in a situation with four participants (two humans, a robot, and a bicycle). The robot can follow and interpret the communicative situations of two humans involved in assembling and disassembling a bicycle.
The experimental results show that the robot is able to correctly recognize the communicative situations with an accuracy of about 82 percent. It identifies the conversation situations at 92 percent and the explanation situations at 75 percent. In addition, the robot is able to correctly capture the focus of attention at a rate of 62 percent. Finally, it correctly records (photographs the situation) at a rate of 81 percent.
The experiment and the off-the-shelf technologies used (Robovie and the motion capture device) are very well presented and described. The paper captures the reader’s attention, not only because of its interesting topic, but also because of its clear-cut narrative line and well-structured argumentation. It ends with a report of the very promising results.
Supplied with beautiful illustrative materials (figures, pictures, and tables), and with very well-organized references, this paper is exemplary for its scholarly precision, captivating argumentation, and orderly presentation.