The objective of this book is “to describe the methodology for using multimodal audio, image, and text technology to characterize video content.” Video is comprised of about 30 frames or pictures, of one sort or another, each second. So, if a video is three minutes long, it contains 5,400 pictures. The idea is to be able to use automated or semi-automated procedures to describe video content, without having to have people manually describe the video. In fact, even if people are used to describe the video, you would still need to develop rules and procedures to get some consistency in how these people characterize video; this would be very time consuming, if not impossible. As more video becomes available, with the advent of always-with-you digital camcorders, characterization becomes more and more of a challenge and opportunity.
This excellent book addresses this goal, with some added bonuses, such as a very good explanation of Moving Picture Experts Group (MPEG)-7, a video compression format. It is a book describing work that overlaps disciplines to achieve the desired results; restricting efforts to a single technique in one fixed discipline limits possible successes. This work is also iterative, with multiple steps. As the entire book series editor, Mubarak Shah, writes in the series foreword, it can use “analysis, then synthesis, then model revision, then coding.” The book makes good use of figures and diagrams. As an example, Figure 1.2 illustrates characterization technologies simply and clearly.
Chapter 1 is an introduction to the book and the topic. It describes research efforts, including the importance of using audio for clues in what is happening in the video. Chapter 2 is a course on video itself. It covers current video terminology, from technical terms, such as discrete cosine transform (DCT) compression, to more popular terms, such as feature film, documentary, or continuous action sports video. With these descriptors, the authors include possible challenges in automated characterizations. For instance, the authors write that soccer is difficult to characterize, because of the continuous movement, and relatively small numbers of scores per game.
Chapter 3 gets down to the objective of the book: providing an overview of automated and manual ways to characterize video. It covers various features and descriptive information that you might find in video, which in later chapters can be used to achieve some results. The list is fairly extensive, and even includes the possibility of detecting and using global positioning system (GPS) data embedded in a video stream for characterization.
Chapter 4 goes on to discuss ways of preparing summaries of information about video, using various methods (including rules-based methods). It refers to the video features discussed in chapter 3, and includes experimental results using these various methods. An important concept in this chapter is that of video skim, and methods to accomplish this video skim.
Chapter 5 covers visualization techniques. This chapter relies heavily on work done at Carnegie Mellon University’s Informedia Digital Video Library, one of the premier digital libraries in the world. This technique uses user interfaces to determine which methods of video summarization work well. For instance, the authors report that thumbnails work better than text, particularly if the thumbnails were assembled based on usage context, rather than key frames. However, they assert that leveraging multiple techniques works best, and, indeed, there are many techniques that one can use. Each section briefly describes evaluation techniques, particularly heuristic ones, and even ones done with high school students.
Chapter 6 describes more formal, quantitative methods of evaluation. Several user studies on video characterization systems are presented. Although each chapter has its own conclusions section, there is a final chapter, chapter 7, that summarizes the prior chapters. It again emphasizes that the best methods of characterization use multiple methods, including audio and language, as well as image techniques.
This book is intended for professors, professional researchers, and developers in the video industry. It could be an excellent book for advanced undergraduate or graduate students, but, unfortunately, it is quite expensive, so could not reasonably be used with other books to support an entire course on digital media.