Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Hierarchical neural network structures for phoneme recognition
Vasquez D., Gruhn R., Minker W., Springer Publishing Company, Incorporated, Berlin, Germany, 2013. 151 pp. Type: Book (978-3-642344-24-4)
Date Reviewed: May 24 2013

This brief book comes packed with useful information about some novel techniques for the recognition of speech building blocks known as phonemes. In fact, as the authors acknowledge, the techniques are not entirely new, but rather an extension of existing methods already published. However, the optimizations and restructuring performed by the authors produce better recognizers in terms of computational load, while having practically the same recognition accuracy as the baseline techniques that do not benefit from these optimizations.

A concise and very-well-written chapter dealing with speech parameters and their computation, as well as well-established recognition procedures in the field, such as hidden Markov models and Gaussian mixture models, follows the short introductory chapter. The authors also present techniques for feature space transformations leading to parameter reduction, including multilayer perceptron networks. Chapter 3 introduces phonation acoustics and presents several known systems for phoneme recognition, with details about the databases used for these experiments.

The main contribution to the field is contained in chapters 4 and 5. In chapter 4, the authors consider the so-called hierarchical approach and downsampling schemes. In essence, this consists of building a hierarchy of neural networks based on backpropagation, where the first level of neural networks accepts speech parameters as inputs, and then applying the “secret sauce”--namely, a form of smart downsampling in the output domain of the first level--that results in significant computational savings while maintaining recognition accuracy (as it would have been without downsampling). Chapter 5 delves into more details about the hierarchical scheme and presents some variations and additional results.

Chapter 6 introduces a phoneme communication scheme, which enables better analysis of recognition failures due to various imperfections and phoneme confusion. Chapter 7 concludes the book with a brief summary of the work and some suggestions for future investigations.

This is a short book; nevertheless, it is brimming with useful and well-presented information. I recommend it for graduate students in the field, as well as for practicing professionals.

Reviewer:  Vladimir Botchev Review #: CR141244 (1308-0685)
Bookmark and Share
 
Speech Recognition And Synthesis (I.2.7 ... )
 
 
Neural Nets (C.1.3 ... )
 
 
Pattern Analysis (I.5.2 ... )
 
 
Pattern Matching (F.2.2 ... )
 
 
Self-Modifying Machines (F.1.1 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Speech Recognition And Synthesis": Date
On-line recognition of spoken words from a large vocabulary
Kohonen T. (ed), Riittinen H., Reuhkala E., Haltsonen S. Information Sciences 33(1-2): 3-30, 1984. Type: Article
Oct 1 1985
Connected spoken word recognition algorithms by constant time delay DP, O (n) DP and augmented continuous DP matching
Nakagawa S. Information Sciences 33(1-2): 63-85, 1984. Type: Article
Jun 1 1985
The phonetic basis for computer speech processing
Ladefoged P., Prentice Hall International (UK) Ltd., Hertfordshire, UK, 1985. Type: Book (9789780131638419)
Dec 1 1987
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy