Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
A robust audio classification and segmentation method
Lu L., Jiang H., Zhang H.  Multimedia (Proceedings of the ninth ACM international conference, Ottawa, Canada,203-211.2001.Type:Proceedings
Date Reviewed: Jan 29 2003

The authors present a technique for the classification of audio into speech, music, environment sounds, and silence classes. Such a classification is useful for audio indexing and retrieval, and for video structure extraction.

The technique is based on the work of Scheirer and Slaney [1]. Three audio features are used: high zero crossing rate ratio (HZCRR), low short time energy ration (LSTER), and spectrum flux (SF). In addition, the authors introduce three new features: linear spectral pairs (LSP) distance, band periodicity (BP), and noise frame ratio (NFR).

A preclassification of the audio stream into speech/non-speech is first obtained, using HZCRR, LSTER, and SF features, with a k-nearest neighbor classifier. The classification results are further refined using LSP distance. Silence is then detected, using simple short-time energy and zero crossing rate features. A rule-based approach is used to discriminate music from environment sound, based on BP, SF and NFR features. The classification accuracy of the described classifier is about 96 percent. However, the classification accuracy must be viewed with prudence. For example, the classification accuracy for noisy speech is 73 percent, and it can be improved to 85 percent.

Although the experiments were carried out on a rich and large dataset, the authors do not provide sufficient arguments to convince the reader of the need for the proposed new features and the two-stage classification scheme. A direct comparison to the work of Scheirer and Slaney on the same dataset could be one such argument.

Reviewer:  Hadi Harb Review #: CR126894 (0304-0378)
1) Scheirer, E.; Slaney, M. Construction and evaluation of a robust multifeature speech/music discriminator. In ICASSP97 (Munich, Germany, April 21–24, 1997), IEEE, New York, 1997, 1331–1334.
Bookmark and Share
  Reviewer Selected
 
 
Evaluation/ Methodology (H.5.1 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Evaluation/Methodology": Date
Guidelines for multimedia usage
Hartley R.  Systems documentation (, Waterloo, Ont., Canada, Oct 5-8, 1993)1061993. Type: Proceedings
Dec 1 1994
The Amsterdam hypermedia model
Hardman L., Bulterman D., van Rossum G. Communications of the ACM 37(2): 50-62, 1994. Type: Article
Feb 1 1995
Grammar-based articulation for multimedia document design
Weitzman L., Wittenburg K. Multimedia Systems 4(3): 99-111, 1996. Type: Article
Sep 1 1997
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy