ComputingReviews.com

Massively parallel lattice-Boltzmann codes on large GPU clusters
Calore E., Gabbana A., Kraus J., Pellegrini E., Schifano S., Tripiccione R. Parallel Computing581-24,2016.Type:Article

Date Reviewed: 01/31/17

Calore et al. present a detailed overview of the development and optimization process of lattice-Boltzmann code for modern graphics processing units (GPUs). The paper begins with a brief introduction of the lattice-Boltzmann method (LBM) and modern Nvidia GPU architectures. Then, the authors explain how to optimize the LBM code for a single GPU using different data structures and possible variants of organizing data parallel kernels. Every applied optimization is guided either by simple analytical performance models or by the results of small benchmarks. Afterwards, the authors present a structured way of porting the single-node LBM code to a large cluster of GPUs. Again, performance models and benchmarks are used to explain the performance of different domain decomposition strategies like 1D or multidimensional tiling.

The paper is well written; the optimization approach is presented in a structured and comprehensible manner. Therefore, the described strategies can be easily adapted for other scientific applications by performance engineers targeting parallel and distributed systems. Furthermore, this paper contains valuable results on developing high-performance libraries or programming frameworks to enable performance-portable code, that is, code that achieves high performance across a wide range of architectures.

Reviewer: Sergei Gorlatch

Review #: CR145036 (1705-0310)

Reproduction in whole or in part without permission is prohibited. Copyright 2024 ComputingReviews.com™
Terms of Use | Privacy Policy