The approach to data mining presented in this work entails a very broad perception of the concept. The book’s focus is original in that it is not restricted to prediction problems based on highly structured data sets. On the contrary, less structured problem domains also feature prominently, and purely descriptive tasks and the need for comprehension frequently prevail over prediction. Moreover, the authors draw from their own problem-solving practice in such diverse fields as narco-terrorism, money laundering, insider trading, claims fraud, retail sales, biomedical research, and telecommunications to provide a colorful palette of examples from various domains.
The book looks at data mining from a practice-oriented viewpoint along two dimensions: theoretical approaches to the data mining process are coupled with an overview of existing tools and technologies. The first section defines data mining and delimits the problems that can be solved using it.
The second part surveys techniques for representing and analyzing information. The complete cycle of accessing and integrating data, analyzing patterns, and presenting results is covered. A distinction is made between visual and nonvisual analytical methods. The authors clearly favor the former, dedicating an entire chapter to data visualization. All other techniques, such as statistical methods, neural networks, and genetic algorithms, are crowded into a single chapter, which is still only half the length of the visual techniques chapter.
The third part assesses existing tools, again with a focus on visualization. Tools are categorized as link analysis tools, landscape visualization tools, or quantitative data mining tools. Numerous examples and screen captures are provided. While such a tool overview may be useful at present, it obviously offers only a snapshot of a rapidly changing domain, and will inevitably be outdated before long. Therefore, an extensive set of references to tool vendors is included as an appendix, to which the reader is referred for the most up-to-date information. A separate chapter is dedicated to future trends in visual data mining. Support for navigation, particularly in a World Wide Web environment, and tools integrating both visual and nonvisual techniques are acknowledged as important developments.
The last section presents a realm of case studies containing “war stories,” once more from varied domains such as pharmaceutical research, telecommunications, finance, retail, and criminal investigation. Descriptions of the actual situation and background associated with the problem domain predominate over mathematical proofs and theoretical discussions. Thus, these case studies contribute nice illustrations of what is possible, but do not provide in-depth insight into explicit formal methodologies based on theory.
The book is accompanied by a CD-ROM that contains color representations of all the black-and-white images in the book. It also includes documentation, trial versions, and sample data sets for several of the tools discussed. All this information is accessible from an attractive user interface. Unfortunately, not all of the demo applications install and operate flawlessly.
Written in down-to-earth language, avoiding overspecialized terminology where possible, the book reads effortlessly. Sometimes, however, concepts that could be considered obvious are elucidated in too much detail. The discussion never becomes tedious, thanks again to the numerous real-life examples, which add a lively flavor.
The book is aimed at readers in the business community, but the general principles are said to apply in any domain. The target audience is described as system developers, technical managers, and analysts.
While the work undoubtedly gives a comprehensive and easy-to-understand introduction to data mining, it does not provide the definitive overview of the domain, because it tips the scale too much in favor of data visualization techniques. Nonvisual methods such as statistical testing, neural networks, and genetic algorithms are barely touched upon. The treatment may be useful as a first taste for the lay reader, but it lacks depth. Anyone hoping to acquire adequate insight to put the latter methodologies into practice will be left out in the cold.
If anything, the book can be seen as a valuable introduction for newcomers to the field. For more experienced readers, its refreshing approach to data mining and its merit of being the only book that offers a thorough discussion of visual techniques make it an estimable complement to, although certainly not a substitute for, works that take a more statistical approach.