Computing Reviews

HPC in big data age:an evaluation report for Java-based data-intensive applications implemented with Hadoop and OpenMPI
Cheptsov A.  EuroMPI/ASIA 2014 (Proceedings of the 21st European MPI Users’ Group Meeting, Kyoto, Japan, Sep 9-12, 2014)175-180,2014.Type:Proceedings
Date Reviewed: 11/25/14

Traditionally, parallel computation has focused on high-performance computing (HPC), and many special platforms and systems have been built for this purpose. In recent years, a new parallel computing model has emerged for handling big data problems: MapReduce and its Hadoop implementation; it is becoming increasingly important.

To address the rapidly increasing requirement of running Hadoop, it is desirable to use the special HPC platforms and systems that have already been built. This paper develops a design for a Hadoop-over-MPI approach that runs Hadoop on traditional HPC platforms, and implements a MapReduce benchmark application, WordCount, based on the design. Then, the paper compares this Hadoop-over-MPI performance with the Hadoop implementation of WordCount.

The paper demonstrates that the Hadoop-over-MPI implementation performs much better than the Hadoop implementation. According to Cheptsov, “the nominal performance of MPI is indeed higher than the [performance] of Hadoop, [and ...] the poor performance of Hadoop could be caused by [the] small size of [the] particular experiment setup.”

Besides the evaluation, the paper also gives a good introduction to the MapReduce and MPI technologies. It is a good read for those who are interested in how MapReduce and MPI work, how they differ from each other, and how their performances compare with each other. Any engineer, researcher, or scientist in information technology will find this paper interesting.

Reviewer:  Long Wang Review #: CR142967 (1502-0149)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy