Parallel programming has become commonplace in many application domains, from deep learning and other machine learning approaches to computer simulations and scientific computing. Scientific computing, understood as the use of numerical algorithms to solve the differential equations that model many scientific and engineering problems, is the focus of this pragmatic textbook.
The book covers four of the main parallel programming tools currently in use: OpenMP, MPI, CUDA, and OpenCL. Instead of providing more thorough coverage of the kinds of topics usually found in parallel programming textbooks (for example, ), this book starts with a hands-on approach, as a lab notebook, right from the first pages.
Because the book is oriented toward scientists who might not be acquainted with compiled languages in the era of Python, MATLAB, and other interpreted programming languages, the first part of the book is devoted to the C programming language, which is still the most common choice for those working on high-performance computing environments.
The following four parts of the book are devoted to the aforementioned parallel programming tools: OpenMP for multicore processors and shared memory multiprocessors, MPI for computer clusters, and CUDA or OpenCL for general-purpose graphics processing units (GPGPUs). Each part covers the basics of each parallelization technology and provides complete code listings rather than mere code snippets. This feature might seem wasteful for seasoned programmers who will often encounter several pages of code corresponding to relatively simple programs, yet it will be useful to novices, for whom this book might be their first contact with parallel computing.
Another interesting feature of this book, not present in other introductory handbooks for particular programming techniques, is the chapter devoted to useful libraries that can be used with each parallel programming tool. Given the book’s scope, it is no surprise that the author has chosen linear algebra packages (BLAS and LAPACK for sequential programming and OpenMP, ScaLAPACK for OpenMP, MAGMA and cuSPARSE for CUDA, and clMAGMA for OpenCL) as well as libraries that implement the fast Fourier transform (FFTW for OpenMP and MPI, cuFFT for CUDA, and clFFT for OpenCL). It is also noteworthy that random number generation using specialized libraries for parallel environments is also covered (for example, cuRAND in CUDA and Random123 in OpenCL).
Each of the first five parts of the book ends with a chapter including brief notes on a set of programming projects that can be used as programming assignments in a parallel programming course. The projects themselves are described at the end of the book (in Part 6). Yet, I would have preferred them to be introduced before, maybe after the introduction to the C programming language, because they are needed to complete the programming projects at the end of each part, and the reader is forced to peruse those chapters anyway. Changing the presentation order, the applications that correspond to the different programming projects could have been used to motivate the need for parallel programming tools. They basically involve solving different kinds of differential equations by means of numerical methods, which are briefly described. Interested readers might need to refer to other sources if they want to delve into the design of the numerical algorithms involved, for example, Numerical recipes  or some of the relatively scarce bibliographic references at the end of the book.
In summary, this book provides a good introduction to parallel programming tools, with the help of line-by-line code walkthroughs, detailed compilation instructions, and how-to information on helpful ancillary libraries. Inexperienced programmers and scientists without a strong computer science (CS) background might find the author’s approach helpful to overcome the learning curve associated with parallel programming and get acquainted with some of the most popular parallel programming tools available today. Those looking for a more thorough textbook on parallel computing might find this book lacking in depth, with respect to parallel algorithm design, and without any coverage of C/C++ multithreading or the partitioned global address space (PGAS) languages, such as Unified Parallel C++, Chapel, X10, and Fortress, that try to blur the lines between shared-memory (OpenMP, CUDA/OpenCL) and distributed-memory systems (MPI).