Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Big data analytics with Spark : a practitioner’s guide to using Spark for large scale data analysis
Guller M., Apress, New York, NY, 2015. 277 pp. Type: Book (978-1-484209-65-3)
Date Reviewed: Jun 8 2016

Programmers seeking to learn the Spark framework and its libraries will benefit greatly from this book. It is a collection of workshops and tutorials on the World Wide Web (WWW) to learn Spark and its related technologies, in a light and fluid reading. The Spark framework itself is easy to use and expressive, presenting outstanding performance when compared to previous technologies such as Hadoop and MapReduce. The book contains useful support for deploying Spark applications on commodity hardware, which includes configuring, executing, and monitoring applications and resources.

The book is well written, with a good balance between presenting simple computer science concepts, such as functional programming, and introducing Scala, the Spark core language. Scala builds on the Java language, adding full support for functional programming that is essential in making Spark an efficient analytics tool to work with big data problems. Readers interested in basic concepts and the foundations that Spark is based upon should focus on chapters 2 to 5, and chapters 10 and 11. All chapters are very didactic, with code walkthroughs and plenty of command lines for testing and deploying Spark applications. The book covers the full stack of the Spark framework, explaining configuration, compilation, and execution of a Spark application, passing through different options and showing the expressive power of Spark in a simple application that can be used for a few bytes to petabytes of data.

Data scientists can use the book as a tutorial guide. The majority of the book is dedicated to two important tools fine-tuned for Spark: (1) structured query language (SQL), and (2) machine learning (ML). Although both tools are already well known in the branch of data analysis, their cluster-enabled version is essential for dealing with big data problems. Chapter 7 describes Spark SQL, concentrating on how to perform interactive analytics over data stored in a structured format, whether static or as a live data stream, covering a large number of integration possibilities. The chapter is dedicated more to different Spark operations and its DataFrame abstraction than the SQL language itself. Chapter 8 starts by introducing the major topics of the ML field, with illustrations and just the right amount of information to get the reader ready for MLlib, the Spark library for ML. The second half of chapter 8 delves into the MLlib application programming interface (API), covering data types, algorithms, and models, going through all of its classes and respective methods.

The core part of the book finishes with a chapter on graphs, and their importance in the branch of data analysis. The chapter describes GraphX, the Spark library, and its API and usage with implementation examples of different graph-based analytics. The chapter features a particularly interesting example on social network creation, querying, and transformation. It ends with more complex graph structures and algorithms, analyzing the Pregel system and the PageRank algorithm.

In summary, the book provides substantial information on cluster-based data analysis using Spark, a prominent framework used by data scientists. It is very nicely written, with interesting contemporary considerations and several source code examples.

More reviews about this item: Amazon, i-Programmer

Reviewer:  Andre Maximo Review #: CR144483 (1608-0554)
Bookmark and Share
  Featured Reviewer  
 
Content Analysis And Indexing (H.3.1 )
 
 
Systems (H.2.4 )
 
 
Reference (A.2 )
 
Would you recommend this review?
yes
no
Other reviews under "Content Analysis And Indexing": Date
Personal bibliographic indexes and their computerisation
Heeks R., Taylor Graham Publishing, London, UK, 1986. Type: Book (9789780947568115)
Sep 1 1987
Development of a term association interface for browsing bibliographic data bases based on end users’ word associations
Pejtersen A., Olsen S., Zunde P., Taylor Graham Publishing, London, UK, 1987. Type: Book (9780947568306)
Nov 1 1989
Transforming text into hypertext for a compact disc encyclopedia
Glushko R. ACM SIGCHI Bulletin 20(SI): 293-298, 1989. Type: Article
May 1 1990
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy