Today, as we are drowning in data, we desperately need bits of dry land where we can climb out of the information ocean and gain useful perspectives. The book under review provides one such vantage point, and anyone whose work involves finding patterns in large amounts of data should take heed.
This is a Deppe’s PhD dissertation, apparently unaltered (the text occasionally mentions things like “time series constitute this dissertation’s main scope” (p. 9). As such, it is a dense work, but the main outlines should be clear to anyone willing to invest a little time in reading it.
The “ill-known” in the title may puzzle potential readers, like it did me, as conventional English would seem to call for “little-known” or “unknown.” But apparently “ill-known” is a standard term in set theory, so the author is not to blame for its awkwardness.
And what of these “motifs”? What are they? Deppe writes:
In literature, motifs have been addressed as recurring patterns, frequent trends or subsequences, shapes, and episodes. Regardless of these various designations, all these terms refer [to] the same goal, namely the detection of frequent unknown patterns. (p. 2)
This is another example of the frequently awkward English littered throughout this book. The motifs themselves do not “refer” to any goal; rather, they are concepts employed in pursuing the goal of finding regular patterns in the data. It is true that we know what the author meant. But a fundamental principle of sound writing is to say what you mean, not say something else and hope the reader can suss out what you meant.
Chapter 3, “General Principles of Time Series Motif Discovery,” offers a useful sketch of how such discovery proceeds: the data is first pre-processed and then put into a common representation; similarity measurements are then performed on this processed data, hopefully resulting in the discovery of interesting motifs.
And what are our choices in terms of motif-discovery algorithms? It turns out we have many ways of approaching the problem:
Motif discovery algorithms differ in the way they tackle problems. These approaches can be analysed with regard to different aspects. As an example, motif discovery methods can be tailored to find exact or approximate motifs or detect motifs with fixed or various lengths. They are able to handle multivariate or univariate time series and execute in on- or off-line mode. Moreover, they can be examined based on representation or mapping methods, similarity measures, their robustness to noise, their ability to be invariant under affine transformations, or the number of required parameters. (p. 29)
In order to overcome the current limits on motif discovery, Deppe has developed a new method she refers to as KITE, which stands for “ill-Known motIf discovery in Time sEries data.” (The relationship of this acronym to the full name makes me glad that Deppe is not specializing in acronym-creation algorithms.)
Deppe claims that KITE advances the state of the art of motif discovery in a number of ways: it applies to diverse domains; it can discover variable-length motives without iteration; it can filter out noise; it can detect motifs altered by uniform scaling, translations, stretching, and squeezing. It also employs cutting-edge work on wavelets to further enhance motif discovery.
This is a difficult and dense book. For those well versed in the mathematics of harmonics and waves, the book should prove very useful in showing how these theories can be applied to data series. But even those who are not specialists in this area, such as myself, can still gain many ideas from this useful tome.