Part of Chapman & Hall/CRC’s “Data Mining and Knowledge Discovery Series,” this edited book captures new developments and applications related to data mining, as well as summaries of computational tools and techniques useful in data analysis. Dong and Bailey are to be complimented for their efforts in bringing together more than 40 authors for the book’s seven parts and 25 chapters. The management alone of such an effort should be commended; it had to be like herding cats.
Contrast data mining aims to find patterns that describe significant nontrivial contrasting differences between datasets. These differences can be of time, class, location, or other dimensions of interest to the miner. Contrast mining has been used for classification, clustering, and discrimination among datasets, as well as for differentiating data across a wide range of applications. For example, contrast mining has differentiated between benign and cancerous tissues, and between blog content from the 2008 and 2012 presidential campaigns.
Part 1, “Preliminaries and Statistical Contrast Measures,” provides introductory material and relevant statistical background information in two chapters.
Part 2, “Contrast Mining Algorithms,” contains five chapters, covering tree-based structures including zero-suppressed binary decision diagrams, selective discrimination patterns for classification, and the mining and incremental maintenance of emerging patterns.
Part 3, “Generalized Contrasts, Emerging Data Cubes, and Rough Sets,” focuses on more expressive patterns, including disjunctive and fuzzy patterns, cube representations for online analytical processing (OLAP) mining, and jumping emerging patterns.
Part 4, “Contrast Mining for Classification and Clustering,” is a key section, which provides an overview and analysis of contrast pattern-based classification and clustering, outlier and rare case prediction, and the enhancement of traditional classifiers with emerging patterns.
Parts 5 and 6, “Contrast Mining for Bioinformatics and Chemoinformatics” and “Contrast Mining for Special Domains,” bring enormous value to the reader. Across 10 chapters, the reader is presented with a range of applications and good detail about successful contrast mining initiatives. The applications cover health, energy, crime, and election analysis, and provide step-by-step discussions of each application. The reader will feel reasonably well grounded after working through this section.
Part 7, “Survey of Other Papers,” summarizes more recent papers not previously cited and is largely a continuation of the previous two parts.
The bibliography includes almost 500 citations, and the index is reasonably complete. The reader will find the book a comprehensive discussion on contrast data mining. For readers new to the field, the material requires some statistical sophistication and particular focus on the details of Parts 2 and 3.