This well-written paper describes a way to produce compact video summaries that give fast and direct access to video data, and presents an impressive browser for video libraries. The paper should be read not only by browser enthusiasts, but also by anyone interested in the analysis or synthesis of raw video or film data.
Traditionally, a video is a sequence of scenes, a scene is a sequence of shots, and a shot is a sequence of frames taken by the same set of cameras. This paper uses mosaics to represent shots, and then scenes, before clustering scenes into a few physical settings. For suitable video library browsing, indexing, and comparison of videos, using physical settings and mosaics is very fast, because one can exploit the assumptions of controlled lighting and rules for the use of background and horizontal camera movements. These assumptions allow one to quickly assemble chosen frames into a mosaic, to distinguish between panning, zooming, establishing, and other shots, and to use this classification to pick appropriate frames to combine into mosaics. The assumptions are satisfied in the authors’ two very different video library examples. In the basketball video library, one is primarily interested in distinguishing the interesting events; in the sitcom video library, one is primarily interested in distinguishing the plots and themes in the half-hour episodes.