Computing Reviews

A robust audio classification and segmentation method
Lu L., Jiang H., Zhang H.  Multimedia (Proceedings of the ninth ACM international conference, Ottawa, Canada,203-211,2001.Type:Proceedings
Date Reviewed: 01/29/03

The authors present a technique for the classification of audio into speech, music, environment sounds, and silence classes. Such a classification is useful for audio indexing and retrieval, and for video structure extraction.

The technique is based on the work of Scheirer and Slaney [1]. Three audio features are used: high zero crossing rate ratio (HZCRR), low short time energy ration (LSTER), and spectrum flux (SF). In addition, the authors introduce three new features: linear spectral pairs (LSP) distance, band periodicity (BP), and noise frame ratio (NFR).

A preclassification of the audio stream into speech/non-speech is first obtained, using HZCRR, LSTER, and SF features, with a k-nearest neighbor classifier. The classification results are further refined using LSP distance. Silence is then detected, using simple short-time energy and zero crossing rate features. A rule-based approach is used to discriminate music from environment sound, based on BP, SF and NFR features. The classification accuracy of the described classifier is about 96 percent. However, the classification accuracy must be viewed with prudence. For example, the classification accuracy for noisy speech is 73 percent, and it can be improved to 85 percent.

Although the experiments were carried out on a rich and large dataset, the authors do not provide sufficient arguments to convince the reader of the need for the proposed new features and the two-stage classification scheme. A direct comparison to the work of Scheirer and Slaney on the same dataset could be one such argument.


Scheirer, E.; Slaney, M. Construction and evaluation of a robust multifeature speech/music discriminator. In ICASSP97 (Munich, Germany, April 21–24, 1997), IEEE, New York, 1997, 1331–1334.

Reviewer:  Hadi Harb Review #: CR126894 (0304-0378)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024™
Terms of Use
| Privacy Policy