Computing Reviews, the leading online review service for computing literature.

Search

Pro Apache Hadoop (2nd ed.)
Wadkar S., Siddalingaiah M., Venner J., Apress, New York, NY, 2014. 444 pp. Type: Book (978-1-430248-63-7)

Date Reviewed: May 12 2015

Big data analysis is an emerging field. Every day there is a tremendous amount of data being generated in all areas. Large volumes of data cannot be handled by traditional computing paradigms. This book introduces Apache Hadoop to process big data. The first chapter introduces the needs of big data, its difficulties, and various analysis concepts. Chapter 2 further introduces Hadoop 2.0, the YARN (“Yet Another Resource Negotiator”) framework, and fundamentals of Hadoop. Chapters 3 and 4 start with basic Hadoop exercises, including MapReduce scripts and how to manage the Hadoop platform. Chapters 5 to 7 focus on the core of Hadoop. The chapters disclose details of the MapReduce frameworks. Furthermore, the book addresses the differences between structured query language (SQL) and Hadoop, and how to mimic commonly used SQL scripts using the Hadoop language. The book also presents examples of big data processing. Multiple application programming interface (API) examples are included to explain how to access and process data via Hadoop. Chapters 8 to 14 describe various advanced topics. Chapter 8 introduces how to test MapReduce frameworks. Chapter 9 describes monitoring the MapReduce frameworks by analyzing the log files. Chapter 10 further teaches how to host a data warehouse (that is, the Hive framework) based on MapReduce. Chapter 11 lets readers learn about data processing pipelines based on Hadoop. Chapter 12 is tailored to enterprise users who can exploit Hadoop to access data stored in Hadoop systems. Chapter 13 is directed to streaming log analysis. Chapter 14 describes the NoSQL database within Hadoop systems. Chapters 15 and 16 switch to the topic of data science. Data science is important for big data analysis. This chapter leads readers to this field with the use of Hadoop, and introduces how to use the Spark and Hama frameworks for data science. Since big data needs cloud computing to facilitate fast processing, chapter 16 presents Hadoop in the cloud environment. Chapter 17 closes the book by teaching readers how to create their own software applications based on Hadoop. This chapter is short, but the previous chapters have already laid the foundation for readers to become big data professionals. Another feature of this book is example-based teaching. Sample code is provided from the first chapter to the last, so readers can learn by doing. In summary, this book is highly recommended for big data scientists and engineers. More reviews about this item: Amazon

Reviewer: Hsun-Hsien Chang	Review #: CR143430 (1508-0642)

Distributed Systems (C.2.4 )

Data Mining (H.2.8 ... )

Web-Based Services (H.3.5 ... )

World Wide Web (WWW) (H.3.4 ... )

Information Storage (H.3.2 )

Would you recommend this review?

yes

Other reviews under "Distributed Systems":	Date

The evolution of a distributed processing network Franz L., Sen A., Rakes T. Information and Management 7(5): 263-272, 1984. Type: Article	Jul 1 1985

A geographically distributed multi-microprocessor system Angioletti W., D’Hondt T., Tiberghien J. Concurrent languages in distributed systems: hardware supported implementation (, Bristol, UK,871985. Type: Proceedings	Oct 1 1985

A fault tolerant LAN with integrated storage, as part of a distributed computing system Boogaard H., Bruins T., Vree W., Reijns G. Concurrent languages in distributed systems: hardware supported implementation (, Bristol, UK,1001985. Type: Proceedings	Aug 1 1985

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy