Graphics processing units (GPUs) have recently become very important tools in high-performance computing (HPC). Due to their inherent massively parallel structure, they have been found to be able to provide the capability of accelerating many HPC algorithms in comparison to an implementation of the same algorithms on a (typically much more expensive) traditional hardware platform. However, they cannot be used for every purpose; a common limitation is memory size, which implies that memory-intensive algorithms tend to be non-competitive on GPUs because they require many slow data transfer operations between GPU and main memory.
This book attempts to provide some guidance for researchers who develop HPC codes and want to run them on GPU-based systems. Specifically, given a concrete mathematical problem, it gives advice on how to choose an algorithm that works well in a GPU environment and on how to implement this algorithm. The first part of the book is devoted to questions related to linear algebra. It consists of six contributed papers discussing algorithms for operations with dense and sparse matrices, respectively (including the computation of eigenvalues), for tridiagonal solvers and for matrix exponentials, and for various types of LU and QR decompositions. The second part contains five papers dealing with numerical algorithms for solving various types of differential equations, and the third part is made up of four papers dealing with stochastic algorithms, in particular Monte Carlo methods. Finally, three papers on fast Fourier transforms and multibody simulations constitute the fourth and last part of the book.
The 18 papers are written by leading experts in the respective areas. They usually demonstrate how to construct appropriate data structures and how to implement the algorithm in a way that optimally exploits the parallelism of the hardware. The resulting programs are described explicitly, sometimes using a pseudocode form of notation, but more frequently as fully fleshed out source code in a standard programming model like the CUDA extension of C++. Performance issues are covered as well. It is clear that the limited number of pages of the book does not allow for the inclusion of algorithms for all conceivable numerical problems. However, the selection of the material is rather broad, and the explanations are generally very thorough. Thus, readers who want to solve a problem not covered in the book will likely be able to find a similar problem (although a bit of searching may be necessary because, regrettably, the book does not have a subject index) and can then, using the given guidelines, transfer the approach for the latter to their own use case.
The intended readership consists of people who already have a certain amount of experience in working with GPUs; hence, the book does not contain an introduction to GPUs and their programming. A certain degree of familiarity with numerical algorithms is assumed as well. For readers with such a background, it will prove to be useful reading.