Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Large-scale graph processing using Apache Giraph
Sakr S., Orakzai F., Abdelaziz I., Khayyat Z., Springer International Publishing, New York, NY, 2017. 197 pp. Type: Book (978-3-319474-30-4)
Date Reviewed: Oct 24 2017

Analysis of graphical data (composed of nodes with edges between them) is a particularly challenging facet of the big data problem. Relational data (tables defining the features of each of a set of entities) enjoy computationally efficient algorithms for streaming processing, such as fast construction of decision trees. But most interesting problems in graph analysis are either NP-hard or require data structures that can quickly grow to exceed the capacity of conventional machines. Such problems benefit greatly from distributing the computation over multiple processors, but the iterative computation characteristic of graph algorithms is not well suited to the MapReduce algorithm [1], which has become ubiquitous in big data implementations.

Google, which invented MapReduce, developed a proprietary system called Pregel to address these concerns, and Giraph is an open-source extension of the basic Pregel concept of node-centric message-passing computation. Giraph runs on top of Hadoop, a distributed file system and implementation of the MapReduce algorithm, so bringing up the Giraph framework is not as simple as running a simple install file. This volume is a cookbook on Giraph.

The first chapter introduces MapReduce and Hadoop, explains their limitations for graph analysis, and discusses the Pregel and Giraph processing models. Chapter 2 is a step-by-step guide to installing Hadoop 1.2.1 and Giraph 1.1.0, with abundant screen shots and explanations for alternative configurations. Chapter 3 introduces the reader to writing a Giraph job, preparing the data, and generating output. Chapter 4 provides detailed implementations of five common graph algorithms in Giraph: PageRank, finding connected components and shortest paths, identifying opportunities for triangle closure, and maximal bipartite graph matching. None of these algorithms is combinatorially explosive, but they are examples of problems for which distributed processing can avoid the very large data structures needed by centralized algorithms. Chapter 5 discusses advanced topics such as optimization, debugging, and failure recovery. Chapter 6 reviews two other graph processing systems, GraphX and GraphLab.

This volume shares the virtues and risks of all detailed cookbooks. Its virtue is that it will help newcomers to Giraph to get up and running quickly. However, its highly detailed instructions and screen shots run the risk of becoming obsolete quickly as new versions of the Hadoop and Giraph packages are released. Giraph is still at 1.1.0, the version described in the book, but the current production release of Hadoop is 2.7.4, with 3.0.0 in alpha. Readers should be aware of possible differences in detail as they follow the book’s instructions.

Users who need to bring up Giraph quickly and who have no experience with the Hadoop-Giraph ecosystem will find the volume a helpful introduction to these powerful tools.

Reviewer:  H. Van Dyke Parunak Review #: CR145609 (1712-0780)
1) Dean, J.; Ghemawat, S. MapReduce: simplified data processing on large clusters. In Proc. of OSDI. USENIX, 2004, 137–149.
Bookmark and Share
  Featured Reviewer  
 
Content Analysis And Indexing (H.3.1 )
 
 
Graphs And Networks (E.1 ... )
 
 
Reference (A.2 )
 
Would you recommend this review?
yes
no
Other reviews under "Content Analysis And Indexing": Date
Personal bibliographic indexes and their computerisation
Heeks R., Taylor Graham Publishing, London, UK, 1986. Type: Book (9789780947568115)
Sep 1 1987
Development of a term association interface for browsing bibliographic data bases based on end users’ word associations
Pejtersen A., Olsen S., Zunde P., Taylor Graham Publishing, London, UK, 1987. Type: Book (9780947568306)
Nov 1 1989
Transforming text into hypertext for a compact disc encyclopedia
Glushko R. ACM SIGCHI Bulletin 20(SI): 293-298, 1989. Type: Article
May 1 1990
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy