Computing Reviews

Practical text mining and statistical analysis for non-structured text data applications
Miner G., Elder J., Hill T., Nisbet R., Delen D., Fast A., ACADEMIC PRESS,Waltham, MA,2012. 1000 pp.Type:Book
Date Reviewed: 06/20/14

This book addresses a timely topic that is intensively used in both academia and industry. Mining text data is common to many activities ranging from sentiment analysis in social media to advanced and automated processing of healthcare data. The conceptual and practical tools for these tasks are manifold: machine learning, text processing, predictive analytics, and clustering techniques are essential tools in the arsenal of any serious text mining scientist. From a practical perspective, several open-source and commercially available software packages are deployment ready.

The authors follow a pragmatic and hands-on approach in their book. Structured around three main topics, with many individual chapters including contributions by the primary authors and other invited authors, the book is a comprehensive collection of introductory tutorials, case studies, and theory. It gives a loosely coupled overview of this area of research. It’s difficult to provide a synthetic review for a book containing more than 1000 pages and jointly written by more than ten contributors without risking overexpose of some chapters at the expense of others.

For a first-time reader, the book is overwhelming: the authors’ practical approach is extreme. Some chapters have detailed stepwise screen shots of the graphical user interfaces (GUIs) illustrated on every second page. Since most of the tools are commercially available, this level of detail will be neither needed nor desired by developers and scientists wishing to develop their own software or use open-source solutions. There are some exceptions in the book, and the specific chapters using R are relevant to such a reader category. I particularly appreciated chapter BB (in Part 2), which shows how to mine Twitter for identifying airline consumer sentiments. However, I found that the many STATISTICA- and SAS-related screen shots functioned as placeholders and provided limited utility. Unless the reader wants to reproduce the exact experiment with the same software package, a more programmatic solution might have been a better choice.

I do recommend, however, the third part of the book, which covers advanced topics. In this part, the authors introduce essential modeling and data mining approaches for text processing. Topics related to hidden Markov models (HMM), Markov random fields, entity extraction, and visualization techniques are covered with a good mix of theory and analysis of benefits and limits in operational deployment.

At the end of this review, I have mixed feelings. Some chapters are highly recommendable, as they address practical problems in a software-neutral and generic manner. Other chapters are overloaded with screen shots and menu-level illustrations of GUIs. They are probably for a nonprogramming audience. For a third reader category, this book can be also relevant. The numerous case studies are very interesting for potential entrepreneurs and managers looking for business ideas. In summary, this book is definitely good for novices looking at a single reference on text mining. For more advanced readers, the technical content is too simple, but the case studies can be good starting point for potential new business developments.

More reviews about this item: Amazon

Reviewer:  Radu State Review #: CR142423 (1409-0728)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy