Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Design of speech-based devices
Pitt I., Edwards A., Springer-Verlag New York, Inc., Secaucus, NJ, 2002. 190 pp. Type: Book (9781852334369)
Date Reviewed: Jun 17 2003

The use of speech output in devices and services is addressed in this book. Speech input and spoken dialogues are not discussed, and readers interested in these topics should look elsewhere (for example, at Designing interactive speech systems [1]).

Chapter 1 contains a very short overview of synthetic speech technology, completely omitting unit selection as the current state-of-the-art. In the second chapter, some basic phonological phenomena like rhythm and intonation are described, together with semantic, cognitive, and pragmatic aspects of speech. In each case, only a short sketch is provided, citing mostly papers written in the 1960s and 1970s. This is not a problem in itself, but some more recent developments should at least be mentioned, for example the ToBI system for describing intonation patterns, which can be used to control the intonation in several speech synthesizers. Some of the issues are taken up in subsequent chapters (pragmatics in chapter 3, semantics in chapter 4, and intonation in chapter 5). Although important issues are raised, the role of this chapter remains unclear.

Chapter 3 deals with interaction design, defined as “what to be said, and when.” The fourth chapter covers dialogue design, “how to say it.” In the third chapter, the principal difference between the visual and auditory channels (parallel and persistent versus serial and transient) is discussed, with its implications, focusing on relevant information, and using listeners’ expectations. The fourth chapter contains sections on the advantages of non-speech sounds (“earcons” and “auditory icons”) and simple grammatical structures. The important aspect of personality, implied by choice of voice and wording (and the necessary consistence), is touched on only very briefly. Both chapters deliver some rules of thumb that are important, and often overlooked by speech interface designers, but it often remains unclear how the implementation in practice may be carried out. For example, for the given/new distinction, the practical advice is “... that a speech based system must maintain a history of its interactions.”

The lack of practical solutions (algorithms and examples) is also present in the fifth chapter, where intonation is revisited. It remains unclear how a speech synthesizer can be controlled to produce the intonation contours described in the text. It would have been instructive to explain how, for example, the well-known and freely available Festival speech synthesizer can be instructed to produce speech with a certain intonation pattern. When using copy synthesis (namely, recording whole words and phrases, and concatenating them), however, readers may benefit from the information in this chapter.

In chapter 6, the authors discuss the problems with reading and navigating lists, using the example of DOS file lists. In the next chapter, three case studies are described: one device (a traffic information system), one service (voice mail), and a video cassette recorder (VCR) programming application, which mainly demonstrates the advantage of speech input over a button-controlled interface. The final chapter contains a selection of future application fields for spoken output, for example in mobile devices and avatars.

Although the book appears in the Springer Practitioner Services, its most glaring shortcoming is its lack of practical information. How can a speech synthesizer be controlled in order to produce the desired output? Standard annotation languages like Sable, or the tags supported by Microsoft’s Speech API (SAPI), are not even mentioned. How can a system keep track of an interaction history to distinguish between given and new information? Some of these questions may be hard to answer, but pointers to the information should at least be included. The book’s most positive feature is its discussion of important issues every designer of a system employing speech output must keep in mind.

Reviewer:  T. Portele Review #: CR127795 (0309-0849)
1) Bernsen, N.O.; Dybkjr, H.; Dybkjr, L.; , ; , Designing interactive speech systems. Springer-Verlag New York, Inc., Secaucus, NJ, 1998.
Bookmark and Share
 
Voice I/ O (H.5.2 ... )
 
 
Natural Language (H.5.2 ... )
 
 
Natural Language Interfaces (I.2.1 ... )
 
 
Speech Recognition And Synthesis (I.2.7 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Voice I/O": Date
The audio notebook: paper and pen interaction with structured speech
Stifelman L., Arons B., Schmandt C.  Human factors in computing systems (Proceedings of the SIGCHI conference, Seattle, Washington, United States,182-189, 2001. Type: Proceedings
May 1 2002
IP telephony: deploying voice-over-IP protocols
Hersent O., Petit J., Gurle D., John Wiley & Sons, 2005.  416, Type: Book (9780470023594)
Apr 26 2006
An empirical study on voice-enabled Web applications
Chang S., Heng M. IEEE Pervasive Computing 5(3): 76-81, 2006. Type: Article
Dec 22 2006
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy