This paper introduces a very interesting technique for the online recognition of large-vocabulary words spoken in isolation. The technique involves a two-stage recognition scheme in which the speech signal is first converted into phoneme-based strings (“test strings”) using a modified spectral recognition method, and then a quick comparison of these phoneme-based strings with reference transcriptions (“prototypes”) stored in a dictionary is performed. There are several reference transcriptions corresponding to a word, represented in a hash index table. The word recognition decision is based on the distances of the test strings from the prototypes of a word-class.
Statistical pattern-recognition methods, incorporating iterative “learning procedures,” are used for determining the test strings of the spoken words. The word recognition is then tackled by a technique called the “redundant hash addressing” wherein, instead of conducting a symbol-by-symbol comparison of the test string with every prototype string, the features of the test string are compared with a table of all the features that have occurred in the prototype strings.
Since most of the computational activity is centered around the construction of the reference data, this method yields a fast rate of recognition.