Computing Reviews

Hierarchical neural network structures for phoneme recognition
Vasquez D., Gruhn R., Minker W., Springer Publishing Company, Incorporated,Berlin, Germany,2013. 151 pp.Type:Book
Date Reviewed: 05/24/13

This brief book comes packed with useful information about some novel techniques for the recognition of speech building blocks known as phonemes. In fact, as the authors acknowledge, the techniques are not entirely new, but rather an extension of existing methods already published. However, the optimizations and restructuring performed by the authors produce better recognizers in terms of computational load, while having practically the same recognition accuracy as the baseline techniques that do not benefit from these optimizations.

A concise and very-well-written chapter dealing with speech parameters and their computation, as well as well-established recognition procedures in the field, such as hidden Markov models and Gaussian mixture models, follows the short introductory chapter. The authors also present techniques for feature space transformations leading to parameter reduction, including multilayer perceptron networks. Chapter 3 introduces phonation acoustics and presents several known systems for phoneme recognition, with details about the databases used for these experiments.

The main contribution to the field is contained in chapters 4 and 5. In chapter 4, the authors consider the so-called hierarchical approach and downsampling schemes. In essence, this consists of building a hierarchy of neural networks based on backpropagation, where the first level of neural networks accepts speech parameters as inputs, and then applying the “secret sauce”--namely, a form of smart downsampling in the output domain of the first level--that results in significant computational savings while maintaining recognition accuracy (as it would have been without downsampling). Chapter 5 delves into more details about the hierarchical scheme and presents some variations and additional results.

Chapter 6 introduces a phoneme communication scheme, which enables better analysis of recognition failures due to various imperfections and phoneme confusion. Chapter 7 concludes the book with a brief summary of the work and some suggestions for future investigations.

This is a short book; nevertheless, it is brimming with useful and well-presented information. I recommend it for graduate students in the field, as well as for practicing professionals.

Reviewer:  Vladimir Botchev Review #: CR141244 (1308-0685)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy