Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Apache Hadoop YARN : moving beyond MapReduce and batch processing with Apache Hadoop 2
Murthy A., Vavilapalli V., Eadline D., Niemiec J., Markham J., Addison-Wesley Professional, Upper Saddle River, NJ, 2014. 400 pp. Type: Book (978-0-321934-50-5)
Date Reviewed: May 15 2015

MapReduce from Apache Hadoop 1 (MapReduce MRv1) has in the next-generation MapReduce (MRv2, or YARN) been divided into two components, where the cluster resource management capabilities have become YARN (Yet Another Resource Negotiator), and the MapReduce-specific capabilities remain MapReduce. While in the MapReduce MRv1 architecture, the cluster was managed by a service called the JobTracker, with TaskTracker services on each host launching tasks on behalf of jobs, and the JobTracker serving information about completed jobs. In MapReduce MRv2, the functions of the JobTracker have been split between three services. First is the ResourceManager, a persistent YARN service that receives and runs applications on the cluster. It contains the scheduler, which is pluggable. Next, the MapReduce-specific capabilities of the JobTracker have been moved into the MapReduce Application Master, which is started to manage each MapReduce job and terminated when the job completes. Finally, the JobTracker function of serving information about completed jobs has been moved to the JobHistory Server, while the TaskTracker has been replaced with the NodeManager, a YARN service that manages resources and deployment on a host. It is responsible for launching containers, each of which can house a map or reduce task.

The authors give a good background on the reasoning behind the above move from MRv1 to MRv2, or YARN, and the resulting huge change this brings to the data stacks ecosystem overall. The reader who wants more details, for example, on configuration and tuning, and walk-through examples, needs to go to the web. This area is under constant development, with YARN as no exception. This is evident when it comes to the scripting parts and links.

Practical details aside, this book is very useful for the reader to get an overview of the architecture, its capabilities, feature set, and related frameworks. The current source code provided for the book needs to be updated; this is something that would considerably increase the usability of the book, especially if all code (not only from selected chapters) would be added. In its current form, the text is less useful for actual testing of the deployment and management of YARN; however, the core concepts of YARN are well described and explained in a pedagogical way to the reader, with an initial focus on the underlying motivations for the evolution toward YARN. The reader is introduced to the core concepts and functional overview of the YARN components in a stepwise manner. The installation steps are described in detail, helping the user into the machinery of setting up his own YARN environment; however, to get it actually in place, the reader needs to go to the web. Throughout the book, the reader is helped to better understand what is needed, the components’ functionality, and what to look for and consider when moving to YARN. A number of installation alternatives are described and the user gets a good idea of today’s existing support for managing and tuning the environment. Further details on administration and monitoring are given, with source code for these specific chapters. Building on the initial functional descriptions of YARN, the authors add a deeper level of insight with respect to the inner-workings of YARN in a dedicated section on its architecture.

The detailed YARN application would benefit from available (and updated) source code to help the user to reproduce the examples as much as possible. The YARN frameworks section gives the user a hint on the importance of YARN, but could be further detailed and extended. Overall, the book is best viewed as a guide to understanding YARN, and less as a hands-on guide to get the details in place. When the authors update the source code for the book, the reader will find it even more useful.

More reviews about this item: Amazon, Goodreads, i-Programmer

Reviewer:  Aake Edlund Review #: CR143443 (1508-0641)
Bookmark and Share
  Reviewer Selected
 
 
Distributed Systems (C.2.4 )
 
 
Distributed Programming (D.1.3 ... )
 
 
Reference (A.2 )
 
Would you recommend this review?
yes
no
Other reviews under "Distributed Systems": Date
The evolution of a distributed processing network
Franz L., Sen A., Rakes T. Information and Management 7(5): 263-272, 1984. Type: Article
Jul 1 1985
A geographically distributed multi-microprocessor system
Angioletti W., D’Hondt T., Tiberghien J.  Concurrent languages in distributed systems: hardware supported implementation (, Bristol, UK,871985. Type: Proceedings
Oct 1 1985
A fault tolerant LAN with integrated storage, as part of a distributed computing system
Boogaard H., Bruins T., Vree W., Reijns G.  Concurrent languages in distributed systems: hardware supported implementation (, Bristol, UK,1001985. Type: Proceedings
Aug 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy