Subsequence matching is a problem where, for given N data sequences of varying lengths, a query sequence Q, and a number representing the tolerance, one must find those sequences one or more subsequences of which are not greater than the tolerance in their distance to Q, as well the offsets of these subsequences. Solutions to this kind of problem are needed in many fields, such as information retrieval and data mining.
One of the solutions of this problem, proposed by Faloutsos et al. [1] to which the authors of this paper refer, is based on dividing data sequences into sliding windows, and the query sequence into disjoint windows, then retrieving similar subsequences by using those disjoint windows. The disadvantage of this method is its tendency to generate many false alarms--candidates that do not qualify--resulting in less effective performance. The method proposed by the authors is called dual match. It is based on dividing the data sequences into disjoint windows, and the query sequence into sliding windows. Because the data sequences are divided into disjoint windows, and the individual points are stored and searched directly in the index, the number of false alarms is reduced and performance is improved.
The paper contains some typographical errors, but the results are original and interesting, and should improve information retrieval and mining mechanisms for data in sequence form.