Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Clash of the titans: MapReduce vs. Spark for large scale data analytics
Shi J., Qiu Y., Minhas U., Jiao L., Wang C., Reinwald B., Özcan F. Proceedings of the VLDB Endowment8 (13):2110-2121,2015.Type:Article
Date Reviewed: Jun 29 2016

Analyzing big data has never been more important in computer science, and the platforms of choice have been either MapReduce or Spark. By using selected workloads that characterize the majority of batch and iterative analytic operations (word count, sort, k-means, linear regression, and PageRank), this paper presents analyses of the performance differences between these two platforms. Although MapReduce is designed for batch jobs and Spark for iterative jobs, it is noted that they are being used, on the field, for both job types.

The authors find that Spark is 2.5 to 5 times faster than MapReduce on the majority of these workloads (the only exception is sort). These results are not so surprising given the key architectural decisions made by the two platforms. This paper is resplendent with the configuration setup parameters of the experiments (hardware, software, and profilers). These parameters are useful for system administrators who want to understand a platform’s behavior under different configurations. We also learn that since the majority of big data analytic workloads are central processing unit (CPU)-bound, both platforms are scalable to the number of CPU cores available to them. System developers can use the knowledge gleaned from this paper to improve the architecture and implementation of Spark and MapReduce, and of the applications running on both platforms.

The explanations of the experiment results are very good: they further the understanding of how architecture and working assumptions affect system performance, and also explain some of the inner workings of the platforms. For example, we learn that as the number of “reduce” tasks is increased, the execution time of the “map” stage increases.

If you want to understand the pros and cons of MapReduce and Spark, and when and how to use them, this paper is a good place to start.

Reviewer:  Tope Omitola Review #: CR144539 (1609-0683)
Bookmark and Share
  Reviewer Selected
 
 
General (H.2.0 )
 
Would you recommend this review?
yes
no
Other reviews under "General": Date
Design of the Mneme persistent object store
Moss J. ACM Transactions on Information Systems 8(2): 103-139, 2001. Type: Article
Jul 1 1991
Database management systems
Gorman M., QED Information Sciences, Inc., Wellesley, MA, 1991. Type: Book (9780894353239)
Dec 1 1991
Database management (3rd ed.)
McFadden F., Hoffer J., Benjamin-Cummings Publ. Co., Inc., Redwood City, CA, 1991. Type: Book (9780805360400)
Jun 1 1992
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy