Computing Reviews, the leading online review service for computing literature.

Search

A multi-modal approach for determining speaker location and focus
Siracusa M., Morency L., Wilson K., Fisher J., Darrell T. Multimodal interfaces (Proceedings of the 5th international conference, Vancouver, British Columbia, Canada, Nov 5-7, 2003)77-80.2003.Type:Proceedings

Date Reviewed: Mar 1 2004

Research done at MIT to develop a computer system that can locate a speaker in a scene, and determine to whom they are speaking, is described in this paper. No intended application is described, but we can guess there will be uses in security systems, automated TV studios, robotic systems, or clandestine spy activities. The system includes dual stereo cameras, and microphones in a fixed position. To calibrate the system, a person speaks directly to the cameras and microphones. The function of the audio is to determine which person is speaking. The functions of the video are to identify the speaker, by identifying moving lips, and to determine to whom they are speaking. The direction of speaking is derived by sampling the one- through six-kilohertz (KHZ) band of each microphone independently. The system then computes a time difference correlation to determine speaker direction. The video can be used to locate a face, to locate face movements derived from speech, and to determine a face’s orientation (speaking direction). The system can then correlate the speaking face and the sound. The computations are described using statistical formulas of the correlation of Gaussian density, and distribution functions from the audio and video inputs. It would have been helpful if the authors included a graphical representation of the distribution functions, to demonstrate the microphone and video inputs and the formula output. Research results show that this level of system is satisfactory if the speaker stands out from any clutter, and is facing somewhat toward the camera. Future research will seek to enhance the system using a speech recognizer and human facial model.

Reviewer: Neil Karl	Review #: CR129163 (0408-0970)

General (I.0 )

Would you recommend this review?

yes

Other reviews under "General":	Date

Nanotechnology: science and computation (Natural Computing Series) Chen J., Jonoska N., Rozenberg G., Springer-Verlag New York, Inc., Secaucus, NJ, 2006. 393, Type: Book (9783540302957)	Aug 2 2007

High performance computing for big data: methodologies and applications Wang C., CRC Press, Inc., Boca Raton, FL, 2018. 286, Type: Book (978-1-498783-99-6), Reviews: (1 of 2)	Apr 4 2019

High performance computing for big data: methodologies and applications Wang C., CRC Press, Inc., Boca Raton, FL, 2018. 286, Type: Book (978-1-498783-99-6), Reviews: (2 of 2)	Nov 14 2019

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy