Through their proposed scheme, the authors of this paper drift away from the common practice of using shape-based measures for performing similarity checks in signal representations of data. A scheme based on the bag-of-words representation for text data has been proposed for comparing variable length time series representations. The bag-of-words technique operates by establishing structural similarities between a set of time series signals. There are two advantages of such a structural analysis: first, since local data subsequences are extracted for similarity comparison, both local and global structures in the data are analyzed; second, the incremental construction of the representation is ideally suited for comparison of streamed data signals.
As part of the proposed scheme, time series words are initially converted to symbolic representations, or discretized, through the use of the technique of symbolic aggregate approximation. Subsequently, processed signals are clustered using Euclidean distance computation, dynamic time warping (DTW), and bag-of-patterns, based on similarities in the subsequences generated previously. Classification of processed signals with the k-nearest neighbor technique has also been tested on various datasets. The authors conducted experiments to test the effectiveness of the proposed approach for both hierarchical and partitional clustering, with results showing improved performance over existing techniques.
Overall, the paper provides thoroughly explored insight on existing time series analysis techniques, as well as a complete analysis of the proposed bag-of-words-based approach on diverse datasets.