Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Hadoop beginner’s guide
Turkington G., Packt Publishing, Birmingham, UK, 2013. 398 pp. Type: Book (978-1-849517-30-0)
Date Reviewed: Sep 27 2013

Big data is currently a major topic in the industry, supported by a mix of startups, open-source projects, and business analytics companies that monetize the permanent increase in data, its velocity, and its diversity. One dominant software platform for running big data analytics is Hadoop, a fact reflected in the increasing number of texts focused on the subject. This book is a step-by-step introduction to the popular Hadoop platform and its rich software ecosystem.

In the first chapter, the author defines the scope of the book, giving a short history of Hadoop and its authors and an overview of the chapters to come. The remainder of the book is roughly divided into three parts.

In the first part (chapters 2 to 5), the author addresses the installation of the Hadoop infrastructure and describes the programming of simple and advanced MapReduce applications. In chapter 2, the reader gets hands-on experience with the complete life cycle of a Hadoop project by downloading and installing the Hadoop distribution. This is followed in the next chapter by the implementation of a classic word-counting program, used in almost every Hadoop introductory text, and a detailed discussion of the underlying MapReduce programming paradigm. This chapter also introduces the important topic of the different versions of the pre-0.20 application programming interface (API). This is particularly relevant for frustrated beginners trying to work with mismatched code samples and Hadoop APIs. Programming MapReduce-like functions is demonstrated in chapters 4 and 5. Chapter 4 presents the fundamentals of standard mapper and reducer tasks, while chapter 5 tackles more advanced content, such as reduce side joins, graph algorithms, and additional Avro and Ruby integration.

The second part of the book (chapters 6 and 7) lays out the operations and management of a Hadoop cluster. Fault management can be challenging in such a distributed and interdependent environment, and the simple recipes described in chapter 6 could save the day when data-driven or system-level faults occur. The configuration, performance, and security of Hadoop systems are examined in chapter 7. The content is up to date and reflects the relatively recent and nascent Hadoop security framework.

Hadoop also comes with an ecosystem of applications for interfacing with existing data sources. These data sources can be traditional databases (SQL databases) or application-specific log (syslog) data, which are handled by specially developed Hadoop-specific interfaces, such as Hive, Flume, and Pig. Part 3 focuses on this ecosystem of supporting applications and demonstrates their use in several small projects.

The book keeps the promise of its title. It’s a real beginner’s guide to one of the most important data processing paradigms of the decade. The many code examples and tutorial-like content make it an excellent resource for learning MapReduce and Hadoop. The learning curve is not as steep as in alternative books, so I can recommend it to all novices interested in this very timely and relevant technology.

More reviews about this item: Amazon, Goodreads, Slashdot

Reviewer:  Radu State Review #: CR141592 (1312-1059)
Bookmark and Share
  Reviewer Selected
Featured Reviewer
 
 
Distributed File Systems (D.4.3 ... )
 
 
Distributed Applications (C.2.4 ... )
 
 
Distributed Programming (D.1.3 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Distributed File Systems": Date
Distributed file systems: concepts and examples
Levy E., Silberschatz A. ACM Computing Surveys 22(4): 321-374, 2001. Type: Article
Nov 1 1991
Scale and performance in a distributed file system
Howard J., Kazar M., Menees S., Nichols D., Satyanarayanan M., Sidebotham R., West M. ACM Transactions on Computer Systems 6(1): 51-81, 1988. Type: Article
Jul 1 1988
Recovery management in QuickSilver
Haskin R., Malachi Y., Chan G. ACM Transactions on Computer Systems 6(1): 82-108, 1988. Type: Article
Aug 1 1988
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy