Computing Reviews, the leading online review service for computing literature.

Search

Hierarchical neural network structures for phoneme recognition
Vasquez D., Gruhn R., Minker W., Springer Publishing Company, Incorporated, Berlin, Germany, 2013. 151 pp. Type: Book (978-3-642344-24-4)

Date Reviewed: May 24 2013

This brief book comes packed with useful information about some novel techniques for the recognition of speech building blocks known as phonemes. In fact, as the authors acknowledge, the techniques are not entirely new, but rather an extension of existing methods already published. However, the optimizations and restructuring performed by the authors produce better recognizers in terms of computational load, while having practically the same recognition accuracy as the baseline techniques that do not benefit from these optimizations. A concise and very-well-written chapter dealing with speech parameters and their computation, as well as well-established recognition procedures in the field, such as hidden Markov models and Gaussian mixture models, follows the short introductory chapter. The authors also present techniques for feature space transformations leading to parameter reduction, including multilayer perceptron networks. Chapter 3 introduces phonation acoustics and presents several known systems for phoneme recognition, with details about the databases used for these experiments. The main contribution to the field is contained in chapters 4 and 5. In chapter 4, the authors consider the so-called hierarchical approach and downsampling schemes. In essence, this consists of building a hierarchy of neural networks based on backpropagation, where the first level of neural networks accepts speech parameters as inputs, and then applying the “secret sauce”--namely, a form of smart downsampling in the output domain of the first level--that results in significant computational savings while maintaining recognition accuracy (as it would have been without downsampling). Chapter 5 delves into more details about the hierarchical scheme and presents some variations and additional results. Chapter 6 introduces a phoneme communication scheme, which enables better analysis of recognition failures due to various imperfections and phoneme confusion. Chapter 7 concludes the book with a brief summary of the work and some suggestions for future investigations. This is a short book; nevertheless, it is brimming with useful and well-presented information. I recommend it for graduate students in the field, as well as for practicing professionals.

Reviewer: Vladimir Botchev	Review #: CR141244 (1308-0685)

Speech Recognition And Synthesis (I.2.7 ... )

Neural Nets (C.1.3 ... )

Pattern Analysis (I.5.2 ... )

Pattern Matching (F.2.2 ... )

Self-Modifying Machines (F.1.1 ... )

Would you recommend this review?

yes

Other reviews under "Speech Recognition And Synthesis":	Date

On-line recognition of spoken words from a large vocabulary Kohonen T. (ed), Riittinen H., Reuhkala E., Haltsonen S. Information Sciences 33(1-2): 3-30, 1984. Type: Article	Oct 1 1985

Connected spoken word recognition algorithms by constant time delay DP, O (n) DP and augmented continuous DP matching Nakagawa S. Information Sciences 33(1-2): 63-85, 1984. Type: Article	Jun 1 1985

The phonetic basis for computer speech processing Ladefoged P., Prentice Hall International (UK) Ltd., Hertfordshire, UK, 1985. Type: Book (9789780131638419)	Dec 1 1987

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy