Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Data mining : practical machine learning tools and techniques (4th ed.)
Witten I., Frank E., Hall M., Pal C., Morgan Kaufmann Publishers Inc., San Francisco, CA, 2016. 654 pp. Type: Book (978-0-128042-91-5)
Date Reviewed: Nov 7 2017

The commercial importance of data mining has led to a plethora of books on the subject, but few volumes prove their value sufficiently to warrant a fourth edition. The first edition of this book appeared in 2000, from the machine learning group at the University of Waikato in Hamilton, New Zealand, which is also responsible for the open-source WEKA data mining environment. Subsequent editions kept pace with extensions to the WEKA environment, but now the environment allows ready integration with external tools such as Python and R (another widely used tool originating in New Zealand), and the current edition extends its scope to techniques that do not have native WEKA implementations.

In spite of its historical linkage with WEKA, the book is not a cookbook for a particular platform (unlike [1], for example, with its focus on the commercial package Statistica). It is an extremely clear and well-organized platform-independent presentation of deriving information from a particular kind of data, instances with attributes. This category is a subset of the data of interest in modern data mining, but a very important subset. The authors’ exposition is highly accessible. They rely more on intuition and examples than on mathematical derivation, though chapters 9 (on probabilistic methods) and 10 (on deep learning), with unavoidable mathematical detail, have been added to this edition. Citations are not embedded in the text, but gathered in a “Further Reading” section at the end of each chapter, and summarized in a final bibliography of over 500 works through 2016.

Most data mining books begin with a single chapter that briefly surveys the field. The first five chapters of this volume, comprising 200 pages, offer an overview that would be ideal for a short course for industrial users with little time and the need to focus on applications rather than theoretical development. These chapters cover example applications, the nature of attribute-structured data, the kinds of output one can expect, a high-level review of basic algorithms, and an excellent survey of different approaches to evaluating the results of a data mining exercise.

Chapters 6 through 12 go into more detail on individual algorithms and schemes. Each of these chapters ends with a section “WEKA Implementations,” a brief summary of which methods in the WEKA framework implement the techniques discussed in the chapter. A final chapter summarizes important application areas and related fields (such as text mining), and appendices outline mathematical foundations and the WEKA workbench itself.

The clarity of the book makes it a natural choice for an undergraduate course in data mining. The mathematical reasoning, liberally supported with plain English exposition, is accessible to students with less mathematical maturity than would be required in a more terse text, such as [2]. But users should be aware of three limitations.

First, the pointers to textbook-scale expositions of general data mining since 2000 are sparse. Nisbet et al. [1] is referenced (though it hardly merits their description as a “comprehensive handbook”), but not Aggarwal’s encyclopedic treatment [3] or the more formal treatment of [2]. Students who wish to learn more should know of these volumes.

Second, unlike [2,3], this book offers no exercises. The authors do point to three free online courses on data mining that they have produced to supplement the book, available through the FutureLearn platform, but these are only available on a fixed schedule, not on demand (unless one pays a fee).

Third, the book focuses exclusively on data that is structured as instances with attributes. There is only passing reference to time series, images, and speech, and the authors argue that text mining is a separate discipline entirely. The book says nothing of the increasingly important areas of mining graphical data or social network analysis.

In spite of these limitations, this volume is the most accessible introduction to data mining to appear in recent years. It is worthy of a fourth edition. It would be ideal for an introductory undergraduate course, if supplemented with [2] (available for free online) for more formal exposition, [3] (for other areas of data mining), and both of these (for exercises). Furthermore, its organization--a thorough introduction followed by more in-depth exposition--makes it of interest to nonacademic readers who simply need to learn quickly what data mining can do for them.

More reviews about this item: Amazon, Goodreads

Reviewer:  H. Van Dyke Parunak Review #: CR145644 (1801-0006)
1) Nisbet, R.; Elder, J.; Miner, G. Handbook of statistical analysis and data mining applications. Academic Press, Amsterdam, Netherlands, 2009.
2) Zaki, M. J.; Meira, W., Jr. Data mining and analysis: fundamental concepts and algorithms. Cambridge University Press, New York, NY, 2014.
3) Aggarwal, C. C. Data mining: the textbook. Springer, New York, NY, 2015.
Bookmark and Share
  Featured Reviewer  
 
Data Mining (H.2.8 ... )
 
 
Content Analysis And Indexing (H.3.1 )
 
 
Learning (I.2.6 )
 
Would you recommend this review?
yes
no
Other reviews under "Data Mining": Date
Feature selection and effective classifiers
Deogun J. (ed), Choubey S., Raghavan V. (ed), Sever H. (ed) Journal of the American Society for Information Science 49(5): 423-434, 1998. Type: Article
May 1 1999
Rule induction with extension matrices
Wu X. (ed) Journal of the American Society for Information Science 49(5): 435-454, 1998. Type: Article
Jul 1 1998
Predictive data mining
Weiss S., Indurkhya N., Morgan Kaufmann Publishers Inc., San Francisco, CA, 1998. Type: Book (9781558604032)
Feb 1 1999
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy