Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Hadoop : the definitive guide (2nd ed.)
White T., O’Reilly Media, Inc., Sebastopol, CA, 2010. 624 pp. Type: Book (978-1-449389-73-4)
Date Reviewed: Feb 21 2012

MapReduce is a programming model for dealing with large-scale data volumes. The model is a two-phased approach in which data distribution phases (map operations) are succeeded by integration phases (reduce operations). MapReduce programming makes distributed platforms transparent and easy to use.

This book follows the typical O’Reilly publishing concept and provides a practical introduction to the Hadoop open-source MapReduce framework. The book covers three main areas related to Hadoop. The first section discusses the design and development of typical Hadoop programs. Java is the default programming language, and Hadoop-specific extensions can be relatively easily mastered; however, the fundamental shift in designing MapReduce algorithms is more difficult to learn. The author introduces this programming approach by showing concrete examples and ready-to-use sample code. Although a cluster is the target infrastructure in operational environments, the samples can be run in standalone mode on a single PC.

The second section (chapters 9 and 10) targets Hadoop cluster installation and maintenance. Some lesser-known issues such as auditing, authentication, and authorization are covered in detail. I appreciated the in-depth discussions on Kerberos-based authentication for future Hadoop distributions.

The third section describes projects built on Hadoop, ranging from SQL-like database services (Hive) to column-oriented databases (HBase), to advanced synchronization services (ZooKeeper). Each project is addressed in a dedicated chapter. These chapters are self-contained, covering not only the administrative tasks related to installing and using the tools, but also interesting follow-up projects and real-world use.

The book is a must-read for someone interested in programming or administering a Hadoop cluster. I also recommend it strongly for any reader interested in learning about a promising new technology.

Reviewer:  Radu State Review #: CR139889 (1207-0677)
Bookmark and Share
  Reviewer Selected
Featured Reviewer
 
 
Distributed Databases (H.2.4 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Distributed Databases": Date
Federated database systems for managing distributed, heterogeneous, and autonomous databases
Sheth A., Larson J. ACM Computing Surveys 22(3): 183-236, 2001. Type: Article
Jul 1 1991
Asserting the optimality of serial SJRPs in processing simple queries in chain networks
Gursel G., Scheuermann P. Information Processing Letters 19(5): 255-260, 1984. Type: Article
Sep 1 1985
Nested transactions: an approach to reliable distributed computing
Moss J., Massachusetts Institute of Technology, Cambridge, MA, 1985. Type: Book (9780262132008)
Mar 1 1986
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy