While many semantic analysis methods for text information exist, the vast majority of current data is of a multimedia nature, which makes the extraction of meaningful knowledge a whole different story. In particular, the case of music is of key importance, given humankind’s widespread appetite for such art. Among the various characteristics of a musical piece, tempo is fundamental; however, finding a good tempo extraction algorithm has been an elusive goal for almost 20 years due to the subtle timing variations of this parameter.
To analyze time-varying signals, one often encodes them using a frequency-based representation, be it cosine functions for the fast Fourier transform (FFT) or wavelets for the various types of wavelet transforms. The approach advocated by the authors uses a variant of the empirical mode decomposition method (EMD), which uses intrinsic--that is, signal-specific--mode functions (IMFs) as the basis for hierarchical decomposition. IMFs are computed from the signal and can be seen as time-scaled, mono-component signals or modes that yield the original signal when summed. To ensure this single-mode property, an enhanced EMD method computes averaged IMFs over signals artificially mixed with white noise. From these components, an estimation of tempo is determined using a selection of the peaks of the autocorrelation function for each IMF.
Preliminary experiments show that this interesting approach is competitive with existing techniques, thus making this paper required reading for researchers in the music analysis field.