One of the major challenges for the cognitive sciences is the development of a computationally tractable and precise theory of language that enables the generation of actions based on the referential meaning provided by the theory. It is the thesis of this paper that this goal should be achieved through a grounding process, where, much as a child first perceives the world through the results of its actions in the world, the system acquires references through a process of associating sensor inputs with system beliefs.
The core of the paper is an overview of how this can be done. The basis for the system is a theory of signs, projections, and schemas. Briefly, a sign represents some limited aspect of an object--a sensor input, for example. A projection maps signs into beliefs; a particularly important kind of projection would be a categorizer that matches beliefs to identifications of objects. Action projections associate actions with their effects. Schemas assemble beliefs and projections into representations of situations. Here, action projections are important precisely because they serve to ground schemas. Note that the approach differs from that of the blocks world, in that no preset model of the world is provided to the system. The paper makes a strong case for this approach to grounding language. The case is reinforced with references to experiments with a manipulator robot called Ripley [1].
The paper should be read by anyone with an interest in natural language processing. The actual implementation of such a system is a separate matter, for which the paper provides references to other sources, such as the work with Ripley.