Since advanced communication systems rely on combining information modalities in terms of speech, audio, sound, image, or video, multimedia signal processing and content analysis has become of paramount importance. This has been amplified by the fact that important application areas emerged, where multimedia signal and content analysis must take place. Think, for instance, about an intelligent system capable of recognizing intruders in a closed-circuit television (CCTV) monitored area and automatically alerting security staff. Think, for instance, about automatic detection and analysis of events in sports, or car-to-car communication systems equipped with intelligent sensors. Moreover, content-based search of multimedia files in databases and streams has gained momentum over the past several years.
In this context, the book takes mostly a signal processing point of view for the analysis of multimedia content. This is key to understanding the content of multimedia streams; files, therefore, enable a smarter content-oriented way of interacting with digital media. The importance of the topic is also underpinned by the numerous scholarly and research activities in this field, including the resource description framework (RDF) for multimedia content (W3C), the metadata dictionary of the Society of Motion Picture and Television Engineers (SMPTE), the Dublin Core Metadata Initiative (DCMI), and MPEG-7. The latter sort of became the audiovisual language that “directly includes methods to describe low-level features of audiovisual signals” and “covers the representation of the content description, while generation ... and consumption of descriptions are considered as application-specific aspects.”
The book moves across the whole spectrum of multimedia signal analysis, starting with preprocessing (for example, nonlinear filters, amplitude-value transformation, and interpolation), and rounds up this spectrum with feature extraction, transformation, and classification. Feature extraction, however, is restricted to what are known as low features rather than high features. More specifically, features like color, edges, and texture are discussed, which may lead to contour and shape analysis, as well as motion analysis. A further shortcoming relates to the discussion of audio signal features, which is delivered in only 20 pages.
Many books can be compared with this one, given the plethora of books in the digital signal processing (DSP) field. In particular, it may be hard for the first 84 pages on preprocessing aspects to withstand the competition coming from classical DSP books. Section 4 and on, where feature extraction is the main topic, could unreservedly be recommended for students as a textbook. The sheer volume of advanced mathematical formulas and algorithmic discussions, however, on the assumption that someone is already familiar with advanced topics in linear algebra, for example, eigenvalues and eigenvectors, makes it hard to recommend as a textbook for undergraduate students. There is also considerable doubt about whether Annex A, at 20 pages, discussing the fundamentals of vector and matrix algebra, stochastic analysis and description, as well as signal processing and analysis, will resolve this problem.
Finally, in the era of multimedia content analysis, including deep learning, digital TV, and social media as advanced forms of computing, it may be a good idea for aspirant learners to seek out complementary books in extraction and the recognition of high-level features.