Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Best of 2016 Recommended by Editor Recommended by Reviewer Recommended by Reader
Search
Big data computing
Akerkar R., Chapman & Hall/CRC, Boca Raton, FL, 2014. 564 pp. Type: Book (978-1-466578-37-1)
Date Reviewed: Jun 6 2014

Whenever a new discipline (or buzzword) appears, we feel the need to understand what it is and how it differs from what existed before. A very explicit declaration about what “big data” means can be found in the preface of this book: it is everything that is “beyond current database technology.” This arrogates any challenging problem that data(base) technology may face now or in the future, but it also runs the risk of making big data a discipline that is defined as a negation of others, as happened with metaphysics. The structure and contents of this book reflect this view of big data. Some technologies that work well for some applications and types (and volumes) of data, such as SQL or data warehousing technologies, are said to be non-scalable for big data.

The book comprises five parts: “Introduction,” “Semantic Technologies,” “Processing,” “Business,” and “Applications.” I did not find this arrangement very useful or meaningful. In fact, some chapters would have fit better in other parts. For instance, the first part contains three chapters, but the first chapter is not introductory at all. It includes a short account of project descriptions for the European Union’s Seventh Framework Programme (FP7) information and communications projects and some specific solutions to some big data problems developed by the author, including an evolutionary approach to knowledge management. This specific approach would be appropriate as an advanced topic later on in the book, but it does not belong in the first chapter. In fact, some other chapters, such as chapters 2 and 11, are more general and introductory than the first chapter.

This is not (and is not meant to be) a comprehensive textbook, but a heterogeneous collection of papers, which results in redundancies and gaps. Some chapters, such as chapter 3, are very sketchy and even informal, while others have a much more technical and polished presentation. Some chapters are summaries of FP7 projects, while others are case studies or very particular solutions to specific problems. Some even include experimental results.

Google’s MapReduce and Apache Hadoop are introduced several times in different chapters, without a single comment about this redundancy. The same happens with (different) definitions of what big data is; the three, four, or five Vs (volume, velocity, variety, variability, and veracity); the McKinsey report; and other topics.

While the book has some important coverage about infrastructure, retrieval, or even knowledge management, the collection is weak in terms of big data analytics. Only chapter 12 addresses this, but only succinctly. In fact, the final part of the book, about applications (representative of important domain areas, such as social data, ambient intelligence, energy, geographic information, and text mining), illustrates that most of these applications require analytic tools. However, the connection with big data is not always clear. One may wonder why these applications cannot be seen as more conventional data mining applications. In fact, many of the techniques, tools, and ideas that are introduced in previous chapters are not used in these applications.

The editor, and especially the publisher, could have been more careful with the proofreading and formatting of the book. For instance, the title of chapter 2 includes the word “Tassonomy” instead of “Taxonomy.” How can a typographical error in the table of contents and a chapter title go unnoticed? Also, many figures (such as Figures 16.3 and 16.5) need colors to be understood (and the captions refer to these colors), but the book is printed in black and white. The absence of section numbers makes cross-reference between sections (and chapters) a very difficult task and makes the book appear even less structured. Grammatical mistakes are also common. Repeated paragraphs (p. 105), formatting issues (bullets missing, p. 108), and other problems could have been solved by more careful proofreading.

It is true that the aim of the book is to “help practitioners to better understand the current state of the art in big data techniques, concepts, and applications,” and the book is full of research directions and ideas. Some contributions are insightful and the collection is strong in some areas, such as infrastructure and the semantic issues of big data. Nonetheless, only researchers in the area will be able to find the chapters that they need. Students, even graduate students, will require the assistance of a course director to determine the chapters to read if they are looking for specific topics. A full, sequential reading of the book is discouraged.

In summary, extracting meaningful knowledge from this book is possible, as the information is there, but the task is not easy. The book itself can be seen as a big data challenge for the reader.

Reviewer:  Jose Hernandez-Orallo Review #: CR142369 (1409-0724)
Bookmark and Share
  Reviewer Selected
Editor Recommended
Featured Reviewer
 
 
Database Applications (H.2.8 )
 
 
Content Analysis And Indexing (H.3.1 )
 
 
General (H.0 )
 
Would you recommend this review?
yes
no
Other reviews under "Database Applications": Date
Databases for genetic services: current usages and future directions
Meaney F. Journal of Medical Systems 11(2-3): 227-232, 1987. Type: Article
Sep 1 1988
Database applications using Prolog
Lucas R., Halsted Press, New York, NY, 1988. Type: Book (9789780470211663)
Aug 1 1990
Oracle’s cooperative development environment
Kline K., Butterworth-Heinemann, Newton, MA, 1995. Type: Book (9780750695008)
May 1 1996
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy