Computing Reviews

Text data management and analysis: a practical introduction to information retrieval and text mining
Zhai C., Massung S., Association for Computing Machinery and Morgan & Claypool, New York, NY, 2016. 530 pp. Type: Book
Date Reviewed: 11/11/16

Fifteen years ago, the field of information retrieval (IR) was still in its infancy, despite the fact that research and development in the field had been progressing for over 30 years, and had provided several significant advances. Then, following the flourishing of modern search engines, the field was re-birthed; the explosive growth of the web fed the field with novel problems, but it also challenged the solutions to the old problems that were thought to be solved until the web’s scale made their solutions inadequate.

The blossoming of information retrieval affected academia in such a way that several university departments (for example, computer science, language processing) adapted their curricula according to the needs of this modern and highly dynamic field. Consequently, new books had to be written to educate the field’s professionals. Most of them based their contents on those topics of information retrieval that were related to search engine science and technology and ignored parts of IR such as those concerning the maintenance and operation of blogs and e-commerce sites, which constituted a major part of the modern web. This book goes against this norm and gives us the opportunity to look at the field with fresh eyes.

The book is composed of four parts. The first provides necessary background knowledge. The second investigates issues related to text data access, including retrieval models, search engine implementation and evaluation, and ranking for web search and recommendation systems. The third part focuses on text data analysis issues, such as text mining, including associations, clustering, categorization, summarization, and opinion and sentiment analysis. The last part provides a brief introduction to the META system that accompanies the book, and also three appendices on Bayesian statistics, EM maximization, and KL-divergence and Dirichlet prior smoothing techniques, which are helpful in providing readers with an in-depth look at the material.

Each chapter is complete; despite the provided references at the end of the chapter, the reader will be properly exposed to and fully understand the chapter’s topic(s). It is a truly unique textbook written by prominent experts in the fields of information retrieval and text data mining.

The exercises at the end of each chapter are very carefully designed, and they provide the right mix of theoretical and practical understanding of the chapter’s material. The book is accompanied by a free and open-source toolkit with which readers can experiment with the concepts of the book.

This book covers a significant gap in the literature. There is a book that covers the topic of link analysis ranking, which is a fundamental part of IR, in an exhaustive manner. There are also a couple of books that cover the topic of inverted index construction and querying in a very detailed manner. Finally, there are a few books that provide solid but broad coverage of search engine architecture, inverted index construction, link analysis ranking, and text mining. Nevertheless, none of these existing books provides coverage for issues spanning text access, analysis, and management in such a thorough way. I can safely characterize it as a book covering topics beyond the realm of search engine science and technology. It is a book that any student (undergrad or postgrad) or professional in information retrieval, filtering, management, and search engine architecture should carefully study.

Reviewer:  Dimitrios Katsaros Review #: CR144918 (1703-0173)

Reproduction in whole or in part without permission is prohibited.   Copyright 2017™
Terms of Use
| Privacy Policy