Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
The BOSS is concerned with time series classification in the presence of noise
Schäfer P. Data Mining and Knowledge Discovery29 (6):1505-1530,2015.Type:Article
Date Reviewed: Mar 30 2016

“The raw time series data may be ... noisy, or are composed of [higher-level] substructures,” or high dimensionality. Its classification complexity increases due to “extraneous, erroneous, and unaligned data of variable length.” Mining raw time series datasets using shape-based and structure-based similarity search uses 1-nearest-neighbour (NN), which fails on long or noisy data, and characteristic patterns, which come at a high computational cost. This paper introduces the bag-of-symbolic Fourier approximation (SFA)-symbols (BOSS) model.

The BOSS model “combines the extraction of substructures with ... tolerance to extraneous and erroneous data using a noise reducing representation of the time series.” This model comprises four parameters: (1) “the window length[, which] represents the size of the substructures”; (2) “mean normalization[, which is] set to true for offset invariance”; and (3) “the SFA word length and (4) alphabet size[, which are] used for low pass filtering and string representation.” The author calculates the computational complexity of the BOSS model using three algorithms: momentary Fourier transform for word length SFA transformation, hashing for a histogram lookup for an SFA word and a predict method for the BOSS distance calculations, and the fit method for leave-one-out cross-validation.

In the experimental evaluation of the BOSS ensemble classifier, the author uses three case studies: astronomy; human walking motions; and anthropology, historical documents, personalized medicine, spectrography, and security. He compares the BOSS classifier with “structure-based shapelets and bag-of-patterns or shape-based 1-NN classifiers using [Euclidean distance] or [dynamic time warping] with the optimal warping window.” He further compares it with “more complex classifiers such as [support vector machines] with a quadratic and cubic kernel and a tree-based ensemble method (random forest).” The author claims that BOSS is “better than each of the rivaling shape-based methods, structure-based methods, and complex classifiers on ... 32 datasets,” and achieves “test accuracy of 100 percent on six datasets and a close to optimal accuracy on several other datasets.”

According to the author, BOSS uses hashing “to determine the similarity of substructures,” which makes it fast, reduces noise, and “provides invariance to phase shifts, offsets, amplitudes, and occlusions.” This is an interesting read for those working in the field of similarity search in raw time series datasets.

Reviewer:  Lalit Saxena Review #: CR144273 (1606-0414)
Bookmark and Share
 
Time Series Analysis (G.3 ... )
 
 
Classifier Design And Evaluation (I.5.2 ... )
 
 
Data Mining (H.2.8 ... )
 
 
Fast Fourier Transforms (FFT) (G.1.2 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Time Series Analysis": Date
Distributed recognition of patterns in time series data
Morrill J. Communications of the ACM 41(5): 45-51, 1998. Type: Article
Sep 1 1998
A software-supported process for assembling evidence and handling uncertainty in decision-making: an experiment with the shortest-paths algorithms
Davis J., Hall J. Decision Support Systems 35(3): 415-433, 2003. Type: Article
Jul 22 2003
Using deterministic chaos theory for the analysis of sleep EEG
Rand J., Collin H., Kapuniai L., Crowell D., Pearce J. In Formal descriptions of developing systems. Hingham, MA: Kluwer Academic Publishers, 2003. Type: Book Chapter
Apr 8 2004
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy