Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Data-intensive systems : principles and fundamentals using Hadoop and Spark
Wiktorski T., Springer International Publishing, New York, NY, 2019. 97 pp. Type: Book (978-3-030046-02-6)
Date Reviewed: Nov 7 2019

Sustaining progress in big data and data science requires the availability of scalable-enabling infrastructures capable of powering data-intensive applications. Progress in cloud computing, coupled with many years of software systems development, has allowed data-intensive enabling systems to achieve a certain degree of maturity and broad adoption. Open-source collaboration on tools such as Hadoop and Spark has contributed to lowering entry barriers for data research and to creating a large and vibrant community of practice.

The sole availability of tools is not sufficient, though. A new form of literacy is required to provide new entrants to the data science field with the necessary proficiency with data-intensive systems, so that more time can be spent on the innovation aspects of data science and less time on learning what is going on under the hood. Academia plays an important role in that alphabetization process, and university courses are increasingly incorporating the basics of data science enabling tools as part of the core curriculum.

This little book from Springer is a spinoff of Wiktorski’s graduate-level course on data-intensive systems at the University of Stavanger. Starting with the basics, the author avoids getting trapped in a discussion of specific technology issues, and instead promotes a problem-based approach to simple but realistic cases. The intent is to enable hands-on experimentation and to deepen understanding and knowledge retention. This looks like a sensible choice to future-proof the book in a rapidly evolving world of technology. The content is organized in nine chapters, covering applications, data analysis with Hadoop, functional abstraction, MapReduce programming, algorithms and patterns, NoSQL databases, and the use of Spark’s data model centered on resilient distributed datasets (RDDs).

After a short preface, chapter 2 provides a recap on the importance of data in modern life, and explains how hardware trends drive the need for new data processing systems that can serve as building blocks to data science applications. Chapter 3 is a how-to guide to bringing up a Hadoop sandbox in less than an hour, enabling readers to start experimenting with the examples given in the book right from the start. Chapter 4 discusses functional abstraction as the foundation for data-intensive systems: the content is not essential to continue with the hands-on experimentation, but it surely helps in gaining a greater understanding of the topic. Chapter 5 introduces MapReduce programming as a real-life implementation of functional abstraction. Chapter 6 starts putting the pieces together in a comprehensive Hadoop architecture by combining MapReduce with the Hadoop Distributed File System (HDFS). Chapter 7 deepens the discussion on MapReduce by providing ways to encode typical operations. The last two chapters introduce NoSQL databases and Spark, a framework conceived to complement and supplement the Hadoop architecture.

Each chapter ends with exercises, to guide readers in their hands-on experimentation. To summarize, this is a recommended book for data science beginners, written in the form of a cookbook to enable hands-on testing. The format is compact and handy, providing a go-to resource for undergraduate students and other interested readers. Its balanced approach to learning by doing makes it an interesting and relevant read.

Reviewer:  Alessandro Berni Review #: CR146764 (2002-0017)
Bookmark and Share
  Featured Reviewer  
 
General (H.0 )
 
 
Algorithm Design And Analysis (G.4 ... )
 
Would you recommend this review?
yes
no
Other reviews under "General": Date
Introduction to database and knowledge-base systems
Krishna S., World Scientific Publishing Co., Inc., River Edge, NJ, 1992. Type: Book (9789810206192)
Nov 1 1993
An introduction to information science
Flynn R., Marcel Dekker, Inc., New York, NY, 1987. Type: Book (9789780824775087)
Apr 1 1988
Acta Informatica 19, 4 (Sept. 1983)
  Acta Informatica 44:1983. Type: Journal
Mar 1 1986
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy