Computing Reviews, the leading online review service for computing literature.

Search

Clash of the titans: MapReduce vs. Spark for large scale data analytics
Shi J., Qiu Y., Minhas U., Jiao L., Wang C., Reinwald B., Özcan F. Proceedings of the VLDB Endowment8 (13):2110-2121,2015.Type:Article

Date Reviewed: Jun 29 2016

Analyzing big data has never been more important in computer science, and the platforms of choice have been either MapReduce or Spark. By using selected workloads that characterize the majority of batch and iterative analytic operations (word count, sort, k-means, linear regression, and PageRank), this paper presents analyses of the performance differences between these two platforms. Although MapReduce is designed for batch jobs and Spark for iterative jobs, it is noted that they are being used, on the field, for both job types. The authors find that Spark is 2.5 to 5 times faster than MapReduce on the majority of these workloads (the only exception is sort). These results are not so surprising given the key architectural decisions made by the two platforms. This paper is resplendent with the configuration setup parameters of the experiments (hardware, software, and profilers). These parameters are useful for system administrators who want to understand a platform’s behavior under different configurations. We also learn that since the majority of big data analytic workloads are central processing unit (CPU)-bound, both platforms are scalable to the number of CPU cores available to them. System developers can use the knowledge gleaned from this paper to improve the architecture and implementation of Spark and MapReduce, and of the applications running on both platforms. The explanations of the experiment results are very good: they further the understanding of how architecture and working assumptions affect system performance, and also explain some of the inner workings of the platforms. For example, we learn that as the number of “reduce” tasks is increased, the execution time of the “map” stage increases. If you want to understand the pros and cons of MapReduce and Spark, and when and how to use them, this paper is a good place to start.

Reviewer: Tope Omitola	Review #: CR144539 (1609-0683)

General (H.2.0 )

Would you recommend this review?

yes

Other reviews under "General":	Date

Design of the Mneme persistent object store Moss J. ACM Transactions on Information Systems 8(2): 103-139, 2001. Type: Article	Jul 1 1991

Database management systems Gorman M., QED Information Sciences, Inc., Wellesley, MA, 1991. Type: Book (9780894353239)	Dec 1 1991

Database management (3rd ed.) McFadden F., Hoffer J., Benjamin-Cummings Publ. Co., Inc., Redwood City, CA, 1991. Type: Book (9780805360400)	Jun 1 1992

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy