Ever wondered how much continuous speech recognition could be done with one AA battery? Eighteen thousand, 180,000, or even 1.8 million words? Given an imputed demand for speech recognition on handheld devices, the dilemma is clear: massive parallel processing algorithms face severely limited AA battery power reservoirs.
The authors argue that a simplified multi-threaded architecture that uses sublanguage information and decentralized controllers to reduce combinatorics in processing speech improves search efficiency, cuts down on the rate of data requests into the memory system, and, consequently, uses less power. The paper briefly introduces the state-of-the-art of speech processing, succinctly presents the authors’ proposal for a system architecture that could effectively be used for handheld devices, and presents a thorough, seven-page performance evaluation, before concluding with a cogent summary of related work and future research directions.
The findings are threefold. First, high-concurrency execution environments with latency tolerance improve speech recognition. Second, reduction of static power dissipation leads to less energy consumption for a given task. Third, the crux in improving performance lies in optimizing the memory system, and reducing heat dissipation during power consumption.
The authors extrapolate a performance of about 95 to 100 words per minute (18,000 words) for three hours of AA battery life. The do-ability is almost certain; the actual usability is a different story. Until speech recognizers go beyond is-this-what-you-mean confirmation prompts, and handle rudimentary dialogue without repetitive user input, battery life takes a back seat to the creature comforts of real-life spoken interaction.