Computing Reviews, the leading online review service for computing literature.

Search

On the efficacy of GPU-integrated MPI for scientific applications
Aji A., Panwar L., Ji F., Chabbi M., Murthy K., Balaji P., Bisset K., Dinan J., Feng W., Mellor-Crummey J., Ma X., Thakur R. HPDC 2013 (Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, New York, NY, Jun 17-21, 2013)191-202.2013.Type:Proceedings

Date Reviewed: Nov 14 2013

In high-performance computing (HPC) environments, heterogeneous compute systems comprised of a distributed cluster of multicore nodes with accelerators such as graphics processing units (GPUs) are becoming the norm. There are robust and mature programming models for distributed computing, as well as for GPU computing. However, it is only recently that hybrid programming models that integrate GPU computing capabilities with explicit message passing have been developed. Aji et al. characterize the performance and productivity of a specific GPU-integrated message passing interface (MPI) framework, MPI-ACC, in two scientific computing applications: an epidemiology simulation and a seismology modeling application. The primary contribution of their work is a detailed case study of two scientific computing applications in which a basic MPI+GPU model that is not integrated is compared to the GPU-integrated MPI-ACC framework. The authors describe the performance effects of using each cluster node’s central processing unit (CPU) concurrently with the node’s GPU, rather than using the GPU exclusively. They also evaluate the effects of various optimizations, such as data communication patterns to reduce communication overhead, and the use of data partitioning to increase concurrency and maximize GPU memory bandwidth. They evaluate these applications using HPCToolkit, and find that the GPU-integrated MPI framework generally outperforms the base MPI+GPU implementations for both applications. The results also show that the use of profiling tools such as HPCToolkit can expose problem areas that, when solved, can lead to significant performance improvements. This paper is thorough and well written, and should be of interest to application developers looking for a detailed analysis of how to use a GPU-integrated MPI framework to build and optimize scientific applications. The authors do not go into detail on GPU kernel implementations. Rather, they focus on the interaction between message passing with MPI and the CPU interface to the GPU. The MPI-ACC framework is the only model discussed, so it would be interesting to learn how the techniques applied in this paper translate to other GPU-integrated MPI frameworks.

Reviewer: Chris Lupo	Review #: CR141731 (1401-0062)

Heterogeneous (Hybrid) Systems (C.1.3 ... )

Parallel Programming (D.1.3 ... )

Would you recommend this review?

yes

Other reviews under "Heterogeneous (Hybrid) Systems":	Date

Computationally intelligent hybrid systems: the fusion of soft computing and hard computing Ovaska S., Wiley-IEEE Press, 2004. Type: Book (9780471476689)	Jun 10 2005

A high performance, low complexity algorithm for compile-time task scheduling in heterogeneous systems Hagras T., Janeček J. Parallel Computing 31(7): 653-670, 2005. Type: Article	Aug 8 2006

Boosting the priority of garbage: scheduling collection on heterogeneous multicore processors Akram S., Sartor J., Craeynest K., Heirman W., Eeckhout L. ACM Transactions on Architecture and Code Optimization 13(1): 1-25, 2016. Type: Article	Jul 12 2016

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy