Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Learning string-edit distance
Ristad E., Yianilos P. IEEE Transactions on Pattern Analysis and Machine Intelligence20 (5):522-532,1998.Type:Article
Date Reviewed: Nov 1 1998

A new stochastic model is presented for computing string-edit distances by learning from a corpus of examples. The model is applied to the difficult problem of learning the pronunciation of a spoken word. Compared to the established, classical reference point, the untrained Levenshtein distance, the new stochastic model makes only one-fifth as many errors.

After the introduction, section 2 explains an algorithm to automatically learn a string-edit distance from similar string pairs. The algorithm is not directly applicable to string classification problems. Yet the stochastic solution in section 3, based on a corpus of labeled strings, automatically learns string classification.

Section 4 outlines empirical results of five experiments. The first two use the switchboard pronouncing lexicon, the next two use a lexicon derived from the corpus, and the fifth merges the two lexicons. The observed superior performance of the new stochastic technique makes the costly process of constructing a lexicon by hand obsolete. It also allows the recognition of a new word from a single example of that word’s pronunciation, both constituting improvements over other methods.

Without prior knowledge of the subject, sections 2 and 3 are hard to read; the meaning and time complexity of some of the statistical formulas are difficult to grasp. However, the results show that the new method has great promise for improving the current state of the art. Moreover, the authors have considered other alternatives, some with better attributes for the automatic acquisition of joint probabilities on string pairs, yet they offer good reasons for the superiority of their stochastic model. The paper is a valuable contribution to learning string-edit distance, and therefore is a must for researchers and developers in spelling correction and pronunciation modeling. The new model may even be extensible to speech recognition, computerized voice I/O, and decryption.

Reviewer:  Herbert G. Mayer Review #: CR121929 (9811-0898)
Bookmark and Share
 
Data Types And Structures (D.3.3 ... )
 
 
Algorithm Design And Analysis (G.4 ... )
 
 
Classifier Design And Evaluation (I.5.2 ... )
 
 
Models (I.5.1 )
 
Would you recommend this review?
yes
no
Other reviews under "Data Types And Structures": Date
Advances in database programming languages
Bancilhon F. (ed), Buneman P., ACM Press, New York, NY, 1990. Type: Book (9780201502572)
Aug 1 1991
Pascal and beyond
Fisher S., Reges S., John Wiley & Sons, Inc., New York, NY, 1992. Type: Book (9780471502616)
Sep 1 1992
On the exact complexity of string matching
Galil Z., Giancarlo R. SIAM Journal on Computing 21(3): 407-437, 1992. Type: Article
Mar 1 1993
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy