While similarity search has been studied thoroughly for archived time series data, the problem has not been investigated for the case where future time series values are not yet available. The support of similarity search for future data is significant in streaming time series data. In such a case, the arrival of data may be detained, which means that techniques that solve the similarity search problem for archived data cannot be directly applied.
The authors study two main approaches to solve the similarity search problem in predicted values: a polynomial approach, which predicts future values based on the recent past, and a probabilistic approach, which uses the whole history of the stream for prediction. In addition, indexing schemes are proposed for the second approach toward efficient similarity query processing.
A set of interesting experimental results is given. The factors studied are the prediction accuracy, the query accuracy, and the query efficiency. Prediction accuracy measures the ability of the methods in predicting future values. Query accuracy measures the ability of the methods to achieve high recall with respect to similarity search. Finally, query efficiency measures the ability of the methods to provide results quickly. The last measure is used to study the performance impact of the applied indexing schemes. The main conclusion coming out of the results is that the probabilistic approach is more accurate than the polynomial one, since the application of indexing schemes significantly improves query execution in comparison to the sequential scanning of the data.