Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Best of 2016 Recommended by Editor Recommended by Reviewer Recommended by Reader
Search
Data just right : introduction to large-scale data & analytics
Manoochehri M., Addison-Wesley Professional, Upper Saddle River, NJ, 2013. 256 pp. Type: Book (978-0-321898-65-4)
Date Reviewed: Aug 26 2014

Handling large-scale data, or big data, is among the current great trends in information and communications technology (ICT). Other trends (mobility, virtual market places, social networks, and the Internet of Things (IoT) and humans) produce enormous amounts of fast-growing data. These require storage techniques and efficient, sometimes real-time, data processing and analysis in private, public, or hybrid clouds, both for end users and for service providers. Most of this data is unstructured and of heterogeneous types. According to the International Data Corporation (IDC), a market research firm specializing in ICT, the four “V”s (volume, variety, velocity, and value) are key attributes of big data.

Unlike former great data sources (transaction processing or enterprise resource planning (ERP) systems, for example), traditional relational database management systems (RDMSs) are unsuitable and ineffective at handling big data alone. The new applications serving the above-mentioned trends require immediate solutions for the big data problem. These solutions work, but, being short of time, they are immature, in contrary to the mental efforts of firms and information technology (IT) communities. A great deal of overlapping solutions have been developing in parallel, expressing practical thoughts in a bottom-up way. The author, himself an actor in these efforts at Google, has a good overview of these themes. He provides readers with an introductory overview of the 2012 state of the art by arranging the confusing details in a logical order.

The author first introduces the practical requests and the main directives for solutions. Then the problems of big data are treated in the order of the flow of data in subsequent parts of the book: collecting, storing, and sharing data; querying data; building data pipelines; data classification by machine learning; and statistical analysis for massive datasets. Ideas and the implemented tools are placed in the chain of chapters according to their relationships and sophistication. One can start reading the book at any part that interests him/her, as the prerequisites are explained and hints to former chapters are given. Each chapter ends with a good summary.

The author explains the cause and arguments of the transition from the safe and strict but inefficient ACID (atomicity, consistency, isolation, durability) properties of a general relational system, to looser requirements and much more effective implementations in special cases. He discusses the role of each property according to application areas, prioritizes them, and establishes a new balance between correctness and efficiency.

Tools and services from Internet giants (Google, Amazon, and Facebook) and a great deal of open-source systems and packages (Apache Hadoop, Python, and R-based ones) are treated. Efforts, new products, and product features of traditional vendors (Oracle, IBM DB2, SAP, SAS, SPSS, and MATLAB) are not dealt with. The former main products’ shortages in handling big data are mentioned and compared with the introduced new solutions.

Only the financial compliance aspect of security is referred to. Other aspects are not treated, only mentioned in a chapter summary.

The book is application oriented and informal. It is a readable overview, illustrated with several examples. The example codes are very simple, demonstrating the functional power of the underlining systems implementing the complex tasks. The book contains an index. External references are given either in the text or in footnotes, mostly as links to Internet sources.

I recommend this introductory book for those who want to realize practical problems and solutions with respect to big data.

More reviews about this item: Amazon, Goodreads

Reviewer:  K. Balogh Review #: CR142656 (1411-0931)
Bookmark and Share
  Editor Recommended
 
 
Content Analysis And Indexing (H.3.1 )
 
 
Data Mining (H.2.8 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Content Analysis And Indexing": Date
Personal bibliographic indexes and their computerisation
Heeks R., Taylor Graham Publishing, London, UK, 1986. Type: Book (9789780947568115)
Sep 1 1987
Development of a term association interface for browsing bibliographic data bases based on end users’ word associations
Pejtersen A., Olsen S., Zunde P., Taylor Graham Publishing, London, UK, 1987. Type: Book (9780947568306)
Nov 1 1989
Transforming text into hypertext for a compact disc encyclopedia
Glushko R. ACM SIGCHI Bulletin 20(SI): 293-298, 1989. Type: Article
May 1 1990
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy