Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Big data science & analytics : a hands-on approach
Bahga A., Madisetti V., VPT, 2016. 542 pp. Type: Book (978-0-996025-53-9)
Date Reviewed: Sep 12 2016

Devoted to the problem of “big data,” which has become an important business in many areas of modern life, this book addresses the state of affairs that is termed the “Fourth Industrial Revolution.” In a simplified way, the big data situation is described as dealing with “the collections of datasets whose volume, velocity, or variety is so large that it is difficult to store, manage, process and analyze the data using traditional databases and data processing tools.” Some industry surveys mentioned in the book “predict that there will be over 2 million job openings for engineers and scientists trained in the area of data science and analytics alone, and that the job market in this area is growing at a 150 percent year-over-year growth rate.”

This book is written as a manual expanding the “A Hands-On Approach” series, to meet the instructional need at colleges and universities. Also, it may be interesting for big data service providers, offering a broader perspective of this emerging field to accompany training programs for their customers and developers.

The book is organized into three main parts, comprising a total of 12 chapters that basically cover all aspects of big data.

Part 1 is an introduction to big data. It relates to big data analytics patterns and architectures. According to the authors, the suggested “methodology forms the pedagogical foundation of [the] book.” A novel data science and analytics application system design is considered, and its realization through the use of open-source big data frameworks is described. This description comprises tools and frameworks for collecting data from various sources. It presents nonrelational (NoSQL) databases for distributed file systems and data storage, and frameworks for batch and real-time processing.

Part 2 contains various tools and frameworks for big data analytics with examples in Python. The reader is introduced to data storage, batch and real-time analysis, and interactive querying frameworks including HDFS, Hadoop, MapReduce, YARN, Pig, Oozie, Spark, Solr, HBase, Storm, Spark Streaming, Spark SQL, Hive, Amazon Redshift, and Google BigQuery. Also described are serving databases (MySQL, Amazon DynamoDB, Cassandra, MongoDB) and the Django Python web framework.

Part 3 presents advanced topics related to various machine learning techniques including clustering, classification, regression, and recommendation. The examples use the Spark MLlib and the H2O machine learning frameworks. This part also includes methods of data visualization using frameworks, such as Lightning, Pygal, and Seaborn.

In summary, this book presents a comprehensive reference source in relation to the basic aspects of big data analytics. A qualified reader can effectively use it for practical work on big data systems. Yet, it is doubtful that straightforward intensification of processing for large volumes of information items alone could actually lead to the anticipated industrial revolution.

The main objective of big data analysis is the formation of knowledge. Primitively thinking, one may assume that accumulation of vast amounts of data is a necessary stipulation for this purpose. But, in fact, it is just an imitation of productive activity. Many big data projects, especially in biology, have been criticized basically for cost and lack of results. Formation of knowledge requires something beyond regular statistical inference; it is a haphazard process that involves serendipity. Thus, successful use of big data requires a qualitatively different approach to the organization of processing.

At this time, “big data” developments are basically focused on technical issues of adapting stupendous information processing requirements to the conventional facilities of common information technology. This book presents a good, comprehensive reference source for these efforts.

More reviews about this item: Amazon

Reviewer:  Simon Berkovich Review #: CR144757 (1612-0872)
Bookmark and Share
 
Content Analysis And Indexing (H.3.1 )
 
 
Data Mining (H.2.8 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Content Analysis And Indexing": Date
Personal bibliographic indexes and their computerisation
Heeks R., Taylor Graham Publishing, London, UK, 1986. Type: Book (9789780947568115)
Sep 1 1987
Development of a term association interface for browsing bibliographic data bases based on end users’ word associations
Pejtersen A., Olsen S., Zunde P., Taylor Graham Publishing, London, UK, 1987. Type: Book (9780947568306)
Nov 1 1989
Transforming text into hypertext for a compact disc encyclopedia
Glushko R. ACM SIGCHI Bulletin 20(SI): 293-298, 1989. Type: Article
May 1 1990
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy