Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Predictive data mining
Weiss S., Indurkhya N., Morgan Kaufmann Publishers Inc., San Francisco, CA, 1998. Type: Book (9781558604032)
Date Reviewed: Feb 1 1999

Data mining is roughly defined as the “search for valuable information in large volumes of data.” This book represents an effort to systematize recent developments in the analysis and management of such data. The authors present the aspects of and approaches to a data mining process, and show how to integrate several techniques, by describing some real-life case studies. The book traces the development of data mining applications, making it a technical guide to performing large-scale analysis of real-life data warehouses. The structure of the work takes into account the main steps to be accomplished in a data mining process: data preparation; data reduction; data modeling and prediction; and case and solution analysis.

The book begins with an attempt to define the concept of data mining and establish the framework for the subsequent discussion. The authors identify the underlying principles of data mining and related concepts, including the storage of massive quantities of data in electronic form (big data); centralized resources for these data (data warehouses); and timeliness (efficient storage and query of time-dependent information). They also discuss the main problems associated with this emerging field, which fall into two general types: prediction (classification, regression, and time series) and knowledge discovery (deviation detection, clustering, and association rules). The spreadsheet model, with two primary dimensions (cases and features), is used throughout the chapter to model the data.

Chapter 2 analyzes classical statistics and prediction and applies them to the evaluation of big data. Because good predictive performance is an important goal, much of the chapter is devoted to error estimation.

Chapter 3 concerns the data preparation phase and describes a standard spreadsheet form for data organization. It examines several forms of raw data and considers transformations that may help improve results, such as normalization, and several techniques for data smoothing. Among the topics covered are missing data, data with strong time-dependencies, and free-text data.

Chapter 4 reviews techniques for reducing data dimensions. This chapter mainly addresses the use of optimal feature selection methods to reduce the number of features; clustering techniques for reducing the number of values; and reducing the number of cases. Methods such as Karhunen-Loeve expansion, decision trees, k-means clustering, nearest neighbor, and class entropy are examined. The authors suggest the use of decision trees as an alternative to the more frequently used methods of feature selection.

Chapter 5 summarizes classification and applied prediction methods, which are broken down into three groups: mathematical (linear solutions, neural nets, and multiple adaptive regression by splines), distance (nearest neighbor), and logic (decision trees and decision rules). The authors analyze several facets of these methods--including solution complexity, data preparation and training, and the effects of data dimensions--and discuss their advantages and drawbacks.

Chapter 6 compares the data reduction techniques from chapter 4 and the prediction methods from chapter 5 in several spreadsheets, so that readers can evaluate them side-by-side. The datasets are from medical, telecommunications, media, service, control, and sales data applications.

Chapter 7 sketches some data mining problems and outlines their solutions, which are a combination of art and science. The examples focus on real-life data mining applications: text mining, process control, and outcome analysis. The chapter describes an organizational model for unifying the tasks of the previous chapters, and presents the protocols for preparing data and organizing the mining effort.

Each chapter is supplemented with bibliographic and historical notes, most related to databases, statistics, and machine learning, which spawned data mining. The bibliography contains recent works.

The book is richly illustrated, embodying the authors’ stress on the role of visualization in offering a better understanding of the book’s topics. Designers of data warehouses, or of any application involving massive quantities of data, will find the book helpful. A mathematical or statistical background is not required; college-level mathematics would suffice. Readers are also invited to test the authors’ software at http:/www.data-miner.com.

Reviewer:  Svetlana Segarceanu Review #: CR122025 (9902-0070)
Bookmark and Share
  Featured Reviewer  
 
Data Mining (H.2.8 ... )
 
 
Organization/ Structure (E.5 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Data Mining": Date
Feature selection and effective classifiers
Deogun J. (ed), Choubey S., Raghavan V. (ed), Sever H. (ed) Journal of the American Society for Information Science 49(5): 423-434, 1998. Type: Article
May 1 1999
Rule induction with extension matrices
Wu X. (ed) Journal of the American Society for Information Science 49(5): 435-454, 1998. Type: Article
Jul 1 1998
Data mining solutions
Westphal C., Blaxton T., John Wiley & Sons, Inc., New York, NY, 1998. Type: Book (9780471253846)
Jun 1 1999
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy