Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Speech emotion recognition: two decades in a nutshell, benchmarks, and ongoing trends
Schuller B. Communications of the ACM61 (5):90-99,2018.Type:Article
Date Reviewed: Mar 18 2020

The two decades referred to in the subtitle essentially span the time since the publication of Picard’s foundational Affective computing [1], which began the study of emotion recognition by computers. This paper can therefore be viewed as a comprehensive review of emotion recognition in speech.

The author begins by laying out an overall view of the process. In gross terms, the process has four components. First, one chooses the model for emotions, either discrete classes or a value continuous dimensional view composed of axes for arousal and positivity. Then one acquires labeled data. Following this, features are selected that are then fed into a learning system. Initially, the labeling of the data required extensive human intervention with the ambiguities that this implies, but now systems exist where the machine can learn to label the data with some human intervention.

This is an iterative process where human advice is used to learn labels. Features can be chunks of audio rather than just words. It is also important to take into account the speaker’s states and traits beyond the emotion of interest. The author summarizes the results of recent speech emotion recognition (SER) challenge events in a useful table. Finally, the author considers challenges that the SER community could undertake.

Going beyond the recognition of irony or sarcasm, the author suggests what he calls a “moonshot challenge” to target the actual emotion of the speaker. The review illuminates a fascinating area and leaves the reader eager for more. There is a comprehensive bibliography.

Reviewer:  J. P. E. Hodgson Review #: CR146935 (2008-0198)
1) Picard, R. W. Affective computing. MIT Press, Cambridge, MA, 1997.
Bookmark and Share
  Featured Reviewer  
 
Speech Recognition And Synthesis (I.2.7 ... )
 
 
General (I.2.0 )
 
 
Natural Language Processing (I.2.7 )
 
 
Introductory And Survey (A.1 )
 
Would you recommend this review?
yes
no
Other reviews under "Speech Recognition And Synthesis": Date
On-line recognition of spoken words from a large vocabulary
Kohonen T. (ed), Riittinen H., Reuhkala E., Haltsonen S. Information Sciences 33(1-2): 3-30, 1984. Type: Article
Oct 1 1985
Connected spoken word recognition algorithms by constant time delay DP, O (n) DP and augmented continuous DP matching
Nakagawa S. Information Sciences 33(1-2): 63-85, 1984. Type: Article
Jun 1 1985
The phonetic basis for computer speech processing
Ladefoged P., Prentice Hall International (UK) Ltd., Hertfordshire, UK, 1985. Type: Book (9789780131638419)
Dec 1 1987
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy