Giannakopoulos and Pikrakis discuss the scope of this book at its beginning:
Before we proceed, it is important to note that, although in this book the term ‘audio’ does not exclude the speech signal, we are not focusing on traditional speech-related problems that have been studied by the research community for decades, e.g. speech recognition and coding. It is our intention to provide analysis methods that can be used to study various audio modalities and their relationship in mixed audio streams. ... In other words, we are not interested in providing solutions that are well tailored to specific audio types (e.g. the speech signal) but are not applicable to other modalities.
The book is divided into three parts. The first part is devoted to a selection of mathematical tools that are used to extract various features of audio streams. Chapter 2 introduces some elementary techniques and properties that will prove helpful in what follows: sampling, playback mono, stereo, block reading and writing, and short-term processing. Chapter 3 brings in the heavy guns, the discrete Fourier transform (using the complex exponential formulation), the discrete cosine transform, the discrete-time wavelet transform, and digital filtering. Included are several MATLAB programs that implement these things.
The following chapter explains how some of the elementary properties of audio files are extracted. Such a file may consist of a single stationary waveform. In real life, however, an audio file probably consists of one or more stationary or nonstationary waveforms mixed with “noise.” Various techniques can eliminate or reduce this noise. Time-domain and frequency-domain audio features centered around the distribution spectrum are defined here and more MATLAB programs are presented.
When these tools and techniques are mastered, we can start using them to extract useful features from the audio streams, things like audio classification, segmentation, alignment, and temporal modeling. The second part of the book contains a chapter for teaching these topics. Chapter 5 begins the study of classification techniques. The features that are extracted from the files form a pyramid. The lower layers of this pyramid use short-term techniques that generate feature vectors that are passed up to higher layers that compute various statistics that, in turn, are passed up to form feature vectors. The end goal is to estimate a class label that is represented by the computed feature vector. Thus, a class label of a certain audio stream might indicate that it is part of a speech made by a certain individual, or perhaps the chirp of a black-capped chickadee or a segment of electronic music.
There are approaches that can use the a priori probabilities to estimate the exact class the sound belongs to. In other cases, nothing at all is known about the sound’s origins. How, then, should such a sound be classified? This is explored in Part 2. The Bayesian classifier, k-nearest-neighbor classifier, and others are introduced at this time, along with the problems of training, testing, and evaluation of the results. Chapter 5 concludes with several case studies.
Chapter 6 tackles the necessity of segmentation. Usually, real-life audio streams consist of sequences of different audio types, things like speech followed by music followed by more speech and so on. The goal here is to split the audio signal into homogeneous segments that can be analyzed separately. Various types of windowing may be used and classification may or may not be desirable.
In chapter 7, “Audio Alignment and Temporal Modeling,” the reader will discover dynamic time warping, hidden Markov modeling, the Viterbi algorithm, the Baum-Welch algorithm, and various training methods.
The chapters are each terminated by a set of exercises. Some of them will require a mathematical analysis. Others will be answered by a MATLAB program. This illustrates the strengths and weaknesses of the book. MATLAB is a very powerful programming system that is well suited for solving problems arising in this field. However, it is not as universally available as other systems such as Microsoft Visual Studio. If MATLAB is available to the reader, then go to it. MATLAB provides a suite of primitives that are eminently suitable for use in programs to solve problems in audio analysis. The MATLAB system is well worth the price for someone with a strong interest in the field.
The reader should also note that a certain level of applied mathematics is required to do any serious work here. Thus, a working knowledge of complex variables and probability theory is required to really grasp the underlying concepts.
At less than 300 pages, the volume is relatively slender and is written in a sparse but graceful style, skillfully edited, and well bound. It is mostly suitable for the reader seriously interested in audio analysis who likes a mathematical programming approach to the subject.