Computing Reviews, the leading online review service for computing literature.

Search

A robust audio classification and segmentation method
Lu L., Jiang H., Zhang H. Multimedia (Proceedings of the ninth ACM international conference, Ottawa, Canada,203-211.2001.Type:Proceedings

Date Reviewed: Jan 29 2003

The authors present a technique for the classification of audio into speech, music, environment sounds, and silence classes. Such a classification is useful for audio indexing and retrieval, and for video structure extraction. The technique is based on the work of Scheirer and Slaney [1]. Three audio features are used: high zero crossing rate ratio (HZCRR), low short time energy ration (LSTER), and spectrum flux (SF). In addition, the authors introduce three new features: linear spectral pairs (LSP) distance, band periodicity (BP), and noise frame ratio (NFR). A preclassification of the audio stream into speech/non-speech is first obtained, using HZCRR, LSTER, and SF features, with a k-nearest neighbor classifier. The classification results are further refined using LSP distance. Silence is then detected, using simple short-time energy and zero crossing rate features. A rule-based approach is used to discriminate music from environment sound, based on BP, SF and NFR features. The classification accuracy of the described classifier is about 96 percent. However, the classification accuracy must be viewed with prudence. For example, the classification accuracy for noisy speech is 73 percent, and it can be improved to 85 percent. Although the experiments were carried out on a rich and large dataset, the authors do not provide sufficient arguments to convince the reader of the need for the proposed new features and the two-stage classification scheme. A direct comparison to the work of Scheirer and Slaney on the same dataset could be one such argument.

Reviewer: Hadi Harb	Review #: CR126894 (0304-0378)

1)	Scheirer, E.; Slaney, M. Construction and evaluation of a robust multifeature speech/music discriminator. In ICASSP97 (Munich, Germany, April 21–24, 1997), IEEE, New York, 1997, 1331–1334.

Evaluation/ Methodology (H.5.1 ... )

Would you recommend this review?

yes

Other reviews under "Evaluation/Methodology":	Date

Guidelines for multimedia usage Hartley R. Systems documentation (, Waterloo, Ont., Canada, Oct 5-8, 1993)1061993. Type: Proceedings	Dec 1 1994

The Amsterdam hypermedia model Hardman L., Bulterman D., van Rossum G. Communications of the ACM 37(2): 50-62, 1994. Type: Article	Feb 1 1995

Grammar-based articulation for multimedia document design Weitzman L., Wittenburg K. Multimedia Systems 4(3): 99-111, 1996. Type: Article	Sep 1 1997

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy