Successful navigation in a three-dimensional (3D) world requires that two-dimensional (2D) retinal images be interpreted to represent a 3D space. This paper introduces a formalism for modeling the occlusion mechanism by which such 2D images are interpreted. The basic idea is that the 2D image on the retina is constructed as a superposition of more primitive 2D “preimages.” Any preimage, or indeed any image, can occlude some other image. The combination of two preimages, taking into account any possible occlusion, defines a binary operation on images that the author calls “occlu.” The paper explores the formal properties of this operation, showing that it is associative. One thus finds a noncommutative monoid of preimages. There are notions of prime and atomic preimages, and minimal sets of preimages generating a given monoid.
Shen specifically addresses the issue of the applicability of this work. He does this by referring to a number of motivating applications that led to the creation of the theory. These are, first, the need for a forward model for image segmentation as an inverse problem; second, the need to facilitate image/scene morphing; and, third, the need to provide a theoretical basis for a quantifiable theory of occlusion. The arguments in the paper use elementary abstract algebra and point set topology. In the case of the proof of associativity of “occlu,” it would perhaps have been more elegant to have found a proof based on the interpretation of “occlu,” so that associativity could appear as an intrinsic property of the operation. The paper will be of interest to those working on theoretical models of vision.