Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
A computational and evolutionary perspective on the role of representation in vision
Tarr M., Black M. CVGIP: Image Understanding60 (1):65-73,1994.Type:Article
Date Reviewed: Dec 1 1995
Comparative Review

In a well-known “Saturday Night Live” television skit, a phone-in talk show host faced with an inert audience tries to stir up his listeners by claiming that he hates puppy dogs. Similarly, the editors of Image Understanding have tried to stimulate a debate by soliciting a provocative paper on the virtues of reconstructing the geometric and physical properties of a scene, as compared to the purposive view that integrates visual modeling with task-specific behavior.

Tarr and Black

Tarr and Black, the authors of the opening salvo, suggest that there are well-grounded computational and evolutionary reasons for the current and future success of the reconstructive approach. Two kinds of evidence are presented in this paper: a general examination of the goals of vision in both artificial and biological systems; and a case study of current trends in the recovery of optic flow that illustrates the continuing viability of the reconstructive approach. The authors sum up their viewpoint: “we believe [purposive vision] is better suited for understanding and mimicking the overall visual behavior of frogs rather than humans.…if the purposive approach does have a role in understanding general purpose vision, it seems likely to be at the level of well-defined and narrowly-constrained tasks, but without obviating the need for recovery and reconstruction…. We believe… [the reconstructive] approach holds out the best hope for ultimately understanding and duplicating the adaptive nature of human vision.”

Fortunately for the reader, this “puppy dog” ploy has produced an interesting set of responses that deal with core issues in computer vision.

Aloimonos

The first respondent, Aloimonos, writes that “complete visual recovery may be as relevant to building visual systems as Gödel’s and Tarski’s theories are relevant to the construction of an airline reservation system.” Aloimonos feels that the important difference between active and passive vision is the following: “When you work on passive vision, you are given a set of images which you will process with the algorithms you are going to develop; on the other hand, when you work on active vision you do not want prerecorded data. You are given an active observer which has control over the image acquisition process and which acquires images that are relevant to what it intends to do. …purposive vision offers a new methodology for building intelligent flexible vision systems, and at the same time it offers new insights for understanding the big picture, i.e., the brain.”

Jain

Jain follows with the observation that “every vision system ever developed was for a purpose and will be for a purpose…I do not understand why we should call one approach purposive and insult others by implying that they have no purpose…I believe that a good computer visionary will neither be a religious nor fanatic person in terms of his theories and techniques. Like any other good scientist or engineer, a computer visionary will be an opportunist who will use relevant models, representations, and techniques to effectively solve precisely defined problems.”

Brown

Brown restricts himself to the issue of general vision and the prospects for achieving it under reconstructive and purposive paradigms. “Perhaps culturally the reconstructivists are closer to espousing general vision as a goal. However, there are serious technical lacks on the part of both reconstructive and purposive approaches. The reconstructive paradigm is well-understood and the purposive one is nascent, but neither has a convincing story about how to achieve general vision.” Brown feels that until more details of a theory of general vision appear, it is difficult to gauge the potential contributions and costs of task-independent and task-dependent visual mechanisms.

Edelman

Edelman feels that despite the promise of its title, the paper by Tarr and Black is not about representation: it is about reconstruction. He notes that they effectively substitute the concept of reconstruction for the admittedly related, but rather more general, concept of representation. “This metonymic slip of the tongue characterizes well the approach taken by the mainstream of computer vision research in the past 15 years. On the one hand, the importance of choosing the right representation for a given computational problem is widely acknowledged…. On the other hand, a generation of researchers abandoned the freedom gained through the realization that representations may be tailored to the task by essentially ignoring the possibility that the best representation of the visual world may not be the same as the visual world reconstructed in full three-dimensional detail.” Edelman sees an emerging synthesis in which representation (and not reconstruction) plays a central role. This new synthesis will constitute a reasonable compromise between vision without representation and the reigning vision of reconstruction without purpose.

Tsotsos

Tsotsos disagrees with the ideas that the reconstructionist paradigm can be a framework for understanding human vision. “There is no reason to believe that the performance, limitations, and capabilities of each of the three approaches (exploratory, task-directed, reconstruction) are identical and that they coincide exactly with the performance, limitations, and capabilities of human vision.… What we have is a puzzle of which only a few pieces have started to take shape.” Tsotsos also feels that the biological support presented by the various contributors to support their favorite paradigm is, at best, out of date.

Fischler

Fischler asks a series of pointed questions: “Can we really define an adequate model of the environment without some intended purpose or set of purposes? How can we judge adequacy or utility (of the model) without specifying how, or for what purpose, the model is to be used--what information should be made explicit, and with what degree of accuracy and completeness? …Is there really a single world ‘out there’ with an accessible abstraction that is adequate for all purposes?” Fischler stresses the fact that any model is an abstraction that must eliminate some knowledge and make some knowledge explicit, and these choices can only be made in a principled way if we know how the representation is to be employed. He faults Tarr and Black for having little to say about the nature of the type of representation they are arguing for.

Aggarwal and Martin

Aggarwal and Martin feel that the seductive promise of reconstruction is that if we can just get more information from the external world and build it into our internal models, we then will be able to easily pick and choose the specific information we need when we are given a specific task. “We claim that this is seductive because the argument is irrefutable without the word ‘easily,’ but it is not true with the word. At a practical level we are reiterating the common-sense notion that only certain aspects of a specific volume can be accounted for in a model, and that purposivism provides the only proper justification for including specific aspects.”

Christensen and Madsen

Christensen and Madsen argue that the two schools, reconstructionist and purposive, should not be viewed as competing, but as complementary. “The reconstruction approach is used for research in vision functionalities, which may be combined into operational systems through a purposive analysis from a global point of view. Such a combined approach to vision is necessary for addressing critical issues such as continuous operation and achievement of specific visual tasks, while maintaining the generality needed to obtain insight into visual cognition.”

Sandini and Grosso

Sandini and Grosso believe that, if one wants to build intelligent machines, the current level of technology only allows us to go from simple to complex. Trying a top-down approach (that is, aiming directly at a general-purpose system that, by definition, will solve all problems) may be interesting, but it requires a much greater act of faith.

Tarr and Black: Response

In their response to the replies, Tarr and Black are all sweetness and light. No more do they hate the puppy dog of purposive vision. They are willing to allow the nose of a goal-driven purposive camel into their tent: “The fact is that we have a very large tent and the purposive camels have been with us all along. Under this ‘big top’ we want both purposivist and reconstructionist camels to feel at home; they can both put on a pretty good show.”

Conclusion

So, as the purposivists and reconstructionists go hand-in-hand into the sunset, we have to thank the editors for soliciting this interesting set of papers that probe basic issues in vision paradigms. Computer vision is a young discipline, and it is useful to ask broad philosophical questions such as “Are we doing the right thing?” and “Is this approach viable?” As a result of this dialogue, we can better understand the interplay of scene reconstruction and the role of purpose.

The discussions make us realize that the question “What is vision?” does not have a simple answer, and that some (like Aloimonos) reject the concept of a general vision system entirely. As far as representation is concerned, both Fischler and Jain make strong arguments that representations can be selected only if the goal of the system is known. (Jain asks rhetorically whether a computer scientist can design a data structure without being given the task specification.) The topic of learning appears only briefly in the discussions--Aloimonos feels that learning can be done more successfully in the purposive paradigm because behaviors, rather than general-purpose representations, are learned. Brown feels that both reconstructive and purposive theories largely ignore learning.

The collection is of great interest to all computer vision researchers, and could be used as a good starting point in a graduate computer vision seminar. Each of the papers has an excellent set of references.

Reviewer:  O. Firschein Review #: CR118742 (9512-1003)
Comparative Review
This review compares the following items:
  • A computational and evolutionary perspective on the role of representation in vision:
  • What I have learned:
  • Expansive vision:
  • Toward general vision:
  • Representation without reconstruction:
  • There is no one way to look at vision:
  • The modeling and representation of visual information:
  • The role of R & R in vision:
  • Purposive reconstruction:
  • Why purposive vision?:
  • Reconstruction and purpose:
  • Bookmark and Share
     
    Computer Vision (I.5.4 ... )
     
     
    Philosophical Foundations (I.2.0 ... )
     
     
    Representations, Data Structures, And Transforms (I.2.10 ... )
     
     
    Computer Vision (I.5.4 ... )
     
     
    Philosophical Foundations (I.2.0 ... )
     
     
    Representations, Data Structures, And Transforms (I.2.10 ... )
     
      more  
    Would you recommend this review?
    yes
    no
    Other reviews under "Computer Vision": Date
    Model-based strategies for high-level robot vision
    Shneier M., Lumia R., Kent E. Computer Vision, Graphics, and Image Processing 33(3): 293-306, 1986. Type: Article
    Sep 1 1986
    TID--a translation invariant data structure for storing images
    Scott D., Iyengar S. Communications of the ACM 29(5): 418-429, 1986. Type: Article
    Nov 1 1986
    Computation of geometric properties from the medial axis transform in O (n log n) time
    Wu A., Bhaskar S., Rosenfeld A. (ed) Computer Vision, Graphics, and Image Processing 34(1): 76-92, 1986. Type: Article
    Jan 1 1988
    more...

    E-Mail This Printer-Friendly
    Send Your Comments
    Contact Us
    Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
    Terms of Use
    | Privacy Policy