During the last decade, much effort has been devoted to developing parallel programming models in order to simplify parallel programming. The bulk synchronous parallel (BSP) programming model tries to reduce the gap between parallel hardware and software. BSP assumes the use of a distributed-memory multiprocessor system, where each processor can access the memory of other processors. BSP algorithms consist of so-called supersteps, where different processors perform computations, followed by a global barrier synchronization. The BSP programming model is implemented through a library containing a reduced set of primitives that can be culled from conventional languages such as C, C++, or Fortran.

In this book, Bisseling introduces, in a friendly style, the BSP programming model and the use of the BSPlib library. Bisseling also discusses the implementation and performance analysis of some well-known algorithms using BSP. The work is intended to be used as a textbook for a course on parallel programming of numerical algorithms for upper-level undergraduates, graduate students, and researchers. No previous experience in parallel programming is required, although some knowledge of a high-level language such as C is needed.

The book is comprised of four chapters, followed by three appendices. Chapter 1 is devoted to a general introduction to BSP, with some consideration given to calculating the computational cost of a BSP algorithm and to providing examples of how to build and run a BSP program. Some bibliographic notes that will help the reader find more information are included.

The remaining chapters are devoted to the parallelization of three different, well-known scientific applications. Performance numbers for real systems, extensive bibliographic notes, and several exercises and programming projects are included. Chapter 2 presents a BSP implementation for the widely known LU decomposition. A sequential implementation and its computational cost are shown before proceeding to the development and refinement of a parallel version. Much effort is also devoted to calculating the cost of the parallel version, an approach that is not very common in books devoted to presenting a parallel programming model. The chapter concludes with some performance figures on a Cray T3E. Chapter 3 studies in detail the fast Fourier transform algorithm, giving the recursive and nonrecursive sequential solutions to the problem and the parallel algorithm. Chapter 4 studies the sparse-vector matrix multiplication problem, paying special attention to the data distribution in order to reduce remote accesses during the parallel execution of the algorithm. Some performance results for a Beowulf cluster are also shown.

The three applications considered are appropriate examples for describing the structured parallel programming model used by BSP. Although it is not the main goal of the book, it would have been useful to have an example on how to program BSP with a more irregular application, where the mapping to the synchronized supersteps programming model is not so clear.

The material covered in the appendices includes some auxiliary functions needed to run the examples of the book, a quick reference guide to BSPlib, and an interesting appendix that shows how to develop structured parallel programming using the message passing interface (MPI) instead of BSPlib.

As stated above, the cost analysis of the solutions is an important part of this volume, although it is not necessary to have an in-depth understanding of the details to learn how to develop parallel versions of sequential, scientific algorithms. The book is carefully written and edited. It is an excellent starting point for learning how to write well-structured, parallel scientific programs.