Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Robust audio identification for MP3 popular music
Li W., Liu Y., Xue X.  SIGIR 2010 (Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Geneva, Switzerland, Jul 19-23, 2010)627-634.2010.Type:Proceedings
Date Reviewed: Aug 29 2011

Li, Liu, and Xue tackle the problem of audio identification in MP3-encoded popular music. They focus on the compressed format as opposed to previously published works on music identification that have focused primarily on decoded raw audio data waveform analysis, which involves decompression, classification, and recompression for the purposes of fingerprinting and other activities. The authors formulate their methodology by constructing what they refer to as compressed domain spectral entropy. This allows them to fingerprint the reference music data and test the robustness and precision of the identification technique against various common alteration methods of the reference music data, such as echo addition, pitch shift, equalization, noise addition, band-pass filtering, MP3-quality resampling, and amplitude (volume) change.

The paper is a good, sound work with a strong theoretical foundation and convincingly reported practical results. Li et al. claim to have a comprehensive test dataset of more than 1,800 songs, with 90 percent identification precision in most cases, which is quite impressive in and of itself. The authors also provide a comprehensive list of cited references, which is very useful to anyone who is new to the field and doing a literature review.

On the other hand, the authors seem to emphasize only popular songs or music, perhaps to draw more attention to the subject. Their approach is quite general and should be able to cover any audio recordings for the purposes of fingerprinting and identification. This may include conference speeches, talk shows, radio, or unconventional and alternative music. The images used in the paper are not of the best quality, specifically the examples of the multiple description transform coding (MDTC) entropy in Figure 3. The runtime performance of the system on which the experiments are conducted is not discussed. The source of the database, where more than 1,800 songs reside, is unclear. Are they specific to the authors’ country of origin? Also, there is no discussion of whether the presence of vocals, beats per minute (BPM), and other acoustic features matters at all, independent of the genre.

It would have been interesting to see the authors consider a compressed spectrum of MP3s without any decoding, to see how it compared to their existing approach, as well as a performance comparison of such an approach. The fact is that a compressed waveform is still a unique form. Unlike cryptographic hashing, compression is reversible, so all pseudo-spectral features are also preserved in the MP3 form. They can be looked at in a binary-like fashion, such as some experiments using the open-source modular audio recognition framework (MARF) [1] on any kind of binary files, without getting into the specifics of the encoded data, such as channels, MDTC coefficients, and partially decompressed distorted query excerpts. The latter approach could also serve as a baseline to show if it is more precise and robust than what the authors discuss, or less so. In future work, the authors plan to look into the matter of cover song identification straight from the compressed domain. I am looking forward to seeing the work in that area.

The community would surely appreciate it if the authors’ implementation and song dataset were released as an open-source project, assuming that it would not violate licensing and copyright laws. One would then be able to independently compare the existing systems, such as the above-mentioned MARF, CMU Sphinx [2], or UIMA [3].

Reviewer:  Serguei A. Mokhov Review #: CR139409 (1203-0317)
1) Mokhov, S. A.; Debbabi, M. File type analysis using signal processing techniques and machine learning vs. file unix utility for forensicanalysis.. In IT Incident Management and IT Forensics (Mannheim, Germany, Sep, 2008), Goebel, O., Frings, S., Guenther, D., Nedon, J., Schadt, D., Eds. GI, Germany, 2008, 73–85. http://subs.emis.de/LNI/Proceedings/Proceedings140/gi-proc-140-007.pdf
2) The Sphinx Group at Carnegie Mellon. The CMU Sphinx group open source speech recognition engines, 2007-2011, http://cmusphinx.sourceforge.net.
3) The Apache Team. Unstructured Information Management Architecture (UIMA), 2006-2011, http://uima.apache.org/.
Bookmark and Share
  Reviewer Selected
Featured Reviewer
 
 
Signal Processing (I.5.4 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Signal Processing": Date
Vector signal processors and digital filters in data compression for electronic publishing
King G., Picton P. Microprocessors & Microsystems 12(7): 555-564, 1988. Type: Article
Feb 1 1992
Signal processing algorithms
Stearns S., David R., Prentice-Hall, Inc., Upper Saddle River, NJ, 1988. Type: Book (9789780138094355)
Jan 1 1989
Synthetic aperture radar
Fitch J., Springer-Verlag New York, Inc., New York, NY, 1988. Type: Book (9789780387966656)
May 1 1989
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy