Within the last few years, spoken interaction between humans and computing devices like smartphones has become fairly common. Spoken interaction with robots has not seen wide use, although it can have a more limited scope of interactions with a simpler language model. The authors propose a framework for the development of speech-based interaction between humans and robots by creating limited grammars that reflect constraints from the robot’s capabilities, the domain under consideration, and the environment in which the robot is located. Thus, one of the main challenges in speech understanding, the identification of the most likely interpretation for a word, phrase, or sentence through disambiguation, can be significantly reduced. This framework utilizes commercial speech-based systems to generate compact grammars for specific stages and contexts during human-robot interaction. A simple dialog manager determines which of the grammars is in effect at a given time, and is responsible for recognizing and clarifying unresolved utterances.
Assuming the availability of a specification of the domain, a description of the capabilities for a particular robot, and a toolkit for the generation of automatic speech recognition engines through compact grammars, interaction designers can put together modules for speech-based interaction between robots and humans in a limited context. The paper gives an excellent overview of related work, proposes an interesting framework to reduce the complexity of speech recognition, and reports on experiments that validate the underlying concept. While a conversation about the meaning of life may be beyond the scope of this approach, requests like “Bring me a beer from the fridge” should be within grasp.