Computing Reviews, the leading online review service for computing literature.

Search

The perception of multiple objects
Mozer M., MIT Press, Cambridge, MA, 1991. Type: Book (9780262132701)

Date Reviewed: Mar 1 1993

I have had a number of years of background in image processing, but minimal exposure to neural networks. Actually, I have always been rather skeptical of the inexplicable black magic that appeared to lie beneath the surface of connectionist models. After reading this book, however, I have developed a respect for the power of neural networks and an appreciation of how they can be used in real systems. The book describes a system, MORSEL (Multiple Object Recognition and attentional SELection), that has been developed by the author to perform translation-invariant recognition of multiple words in a visual field in a manner similar to that of the human visual system. While not a complete system, MORSEL does emulate a reasonable portion of vision and yields results that correspond nicely with experimentally observed human behavior. Chapter 1, “Introduction,” presents an overview of the MORSEL system. The system implements a computational model that attempts to predict what information the visual system is capable of processing in parallel. The actual implementation of MORSEL described in this book only recognizes letters and words, rather than attempting to implement the entire visual system. A more complete system would contain modules to recognize other independent attribute dimensions such as color and motion. Chapter 2, “Multiple Word Recognition,” describes the model of the visual field and the structure of the letter and word recognition process. The module that performs this recognition is called BLIRNET (Builds Location Invariant Representations) and is a multilayered hierarchical neural network. The bottom layer of this network receives its input from a 36×6 array that serves as the retina of the visual system. The output layer of BLIRNET contains units that have been trained to recognize particular position-invariant short clusters of letters. A total of 56,966 possible letter clusters of the form recognizable by the network exist. In order to reduce the implementation to a feasible size, the current implementation recognizes only the 540 letter clusters most commonly used in English. Even so, BLIRNET contains 606,800 neural connections. BLIRNET recognizes individual words reasonably well. When multiple words are presented to the retina simultaneously, however, significant crosstalk and noise occur. This issue is dealt with in the next chapter. Chapter 3, “The Pull-out Network,” discusses how MORSEL reduces noise and separates words that are presented simultaneously. The pull-out network consists of a neural network that contains syntactic and pseudo-semantic rules that cause the various letter-cluster units to enhance or reinforce each other. The connection weights were hand-crafted, but Mozer maintains that it would be possible to build a network that could have its connections trained through experience and perform at least as well as the hand-crafted network. Chapter 4, “The Attentional Mechanism,” discusses a mechanism that directs attention to particular portions of the visual field and helps coordinate the processing of independent attribute recognition modules. The mechanism is implemented as a simple filter that controls the flow of information from the first layer of BLIRNET to the next. Chapter 5, “The Visual Short-Term Memory,” discusses the mechanism used to combine the results of the various low-level processing modules (such as BLIRNET) and integrate the observed attributes (color, motion, words, geometric forms, and so on) into a descriptor of a single coherent object. This integration is achieved with the cooperation of the attentional mechanism. As each feature is detected, the feature and the focus of attention are stored in the system’s short-term memory. As features are serially extracted from the various modules, descriptions of objects are built up. In chapter 6, “Psychological Phenomena Explained by MORSEL,” Mozer compares the behavior of MORSEL with the behavior of human subjects of various psychological experiments, both recent and historical. Some human perceptual errors match those of MORSEL. The relative time that it takes to search for letters on various backgrounds is consistent with MORSEL’s behavior. Certain forms of dyslexia cause behavior that can be modeled by disabling portions of MORSEL’s machinery. This chapter is the longest in the book, and I found it fascinating. While we are still a long way from fully understanding how the human visual system works, it would appear that models such as this are getting close. Chapter 7, “Evaluation of MORSEL,” discusses deficiencies in the current implementation of MORSEL and future directions of the work. Generally, Mozer believes that many of the details are wrong and that the implementation is not sufficiently general. On the positive side, MORSEL successfully achieves translation-invariant multiple-object recognition, performs better than other connectionist models, and explains many psychological phenomena. Appendix A, “A Comparison of Hardware Requirements,” compares BLIRNET with two earlier connectionist models. Unsurprisingly, BLIRNET is more efficient. AppendixB, “Letter Cluster Frequency and Discriminability within BLIRNET’s Training Set,” is a four-page listing of the relative frequencies of the various letter clusters in BLIRNET’s training set. The bibliography contains approximately 250 items, dating from 1943 through 1990. Ten percent of the items are from 1990. The index is adequate but weak. Index references to previous research appear to be haphazard. For example, David Marr is mentioned on page 23, but only the reference to him on page 92 is indexed. The figure on page 20 may have a typo. If not, then an unexplained asymmetry exists in the architecture of the first layer of BLIRNET. I found this book to be enjoyable and readable, like a nicely written doctoral thesis. While it did not offer me enough information to replicate the system, it did leave me with a good understanding of the successes and limitations of the author’s work.

Reviewer: David Goldfarb	Review #: CR116062

Computer Vision (I.5.4 ... )

Connectionism And Neural Nets (I.2.6 ... )

Image Processing Software (I.4.0 ... )

Neural Nets (I.5.1 ... )

Vision And Scene Understanding (I.2.10 )

Would you recommend this review?

yes

Other reviews under "Computer Vision":	Date

Machine vision Vernon D., Prentice-Hall, Inc., Upper Saddle River, NJ, 1991. Type: Book (9780135433980)	Oct 1 1992

Computer vision, models and inspection Marshall A., Martin R., World Scientific Publishing Co., Inc., River Edge, NJ, 1992. Type: Book (9789810207724)	Jun 1 1993

Machine interpretation of line drawings Sugihara K. (ed), MIT Press, Cambridge, MA, 1986. Type: Book (9789780262192545)	Feb 1 1988

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy