Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Multicore computing : algorithms, architectures, and applications
Rajasekaran S., Fiondella L., Ahmed M., Ammar R., Chapman & Hall/CRC, Boca Raton, FL, 2013. 452 pp. Type: Book (978-1-439854-34-1)
Date Reviewed: Apr 25 2014

Multicore systems have become omnipresent in every aspect of computing. However, programming these architectures and especially achieving the best application performance are difficult tasks, because many different aspects and effects have to be considered.

This book is a compendium of papers focused on multicore architectures, their applications, and algorithms used to achieve high performance on these architectures. It also provides insights into hardware components, as well as advanced optimization techniques.

Understanding the memory hierarchies of multicore systems is one of the cornerstones in achieving the best performance. The book starts with an extensive introduction to the memory hierarchies of multicore processors in chapter 1, which provides the basics for optimizing algorithms and applications for modern cache hierarchies as used within state-of-the-art multicore central processing units (CPUs).

Chapter 2 discusses a new set-balancing strategy for last-level caches: flexible set-balance. This strategy reduces cache misses and may be part of future multicore architectures.

The popular multicore SPARC architecture used in many contemporary computer systems, from laptops to data center servers and supercomputers, is described in chapter 3. The authors explain the instruction set and the memory model of the SPARC architecture, and provide a good introduction to the cache-coherence protocol of the SPARC architecture using two simple examples.

In chapter 4, the book moves from architectures to the Cilk programming approach for multicore CPUs, which is an extension of the C programming language that expresses parallelism; it was originally developed in 1994 at MIT. Cilk and its object-oriented version Cilk++ received new attention in 2009 when Intel advanced Cilk to Cilk Plus with further extensions (for example, array notation) for data-parallel tasks and fully integrated Cilk into its compiler toolkit. Intel Cilk Plus can be used to program multicore Intel CPUs and also Intel Xeon Phi accelerators. The Cilk runtime system provides parallelism implicitly by creating tasks that are then distributed onto computing resources. The runtime system also performs load balancing and manages the (re-)distribution of parallel tasks in a way that is completely transparent to the developer. The chapter describes in detail the distribution and load-balancing strategies of the Cilk runtime system.

Many interesting case studies are outlined throughout the book. While most of them are standard problems in computer science (for example, sorting and string matching), a backprojection algorithm is also presented and evaluated in the context of different multicore architectures and graphics processing units (GPUs). Besides these common problems, computation-intensive problems, for example, n-body simulation and covariance kernels used in computational statistics, are used to evaluate modern multicore architectures from Intel and AMD, as well as GPUs and the IBM Cell Broadband Engine. Popular GPU architectures (Nvidia GT200 and Nvidia Fermi) are described and discussed throughout the book.

One case study is especially interesting for GPU software developers. A complete chapter is devoted to matrix multiplication on GPUs. This chapter shows step by step how to achieve the best performance for this algorithm. The evaluation in this chapter demonstrates that the presented improvements enable the matrix multiplication code to outperform Nvidia’s highly optimized BLAS implementation cuBLAS. These techniques may be also useful for other algorithms to achieve higher performance on GPUs.

In summary, this book may be interesting for software developers working on multicore architectures that aim to achieve the best performance for their applications (for example, scientific simulations). The book provides details and insights on how to tune applications for various hardware architectures. It also offers the reader an opportunity to dig deeper into the techniques used within popular libraries (for example, PLASMA), programming languages (for example, Cilk/Cilk++), and underlying hardware.

Reviewer:  Sergei Gorlatch Review #: CR142218 (1407-0498)
Bookmark and Share
 
Concurrent Programming (D.1.3 )
 
 
Concurrent Programming Structures (D.3.3 ... )
 
 
Parallelism And Concurrency (F.1.2 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Concurrent Programming": Date

Type: Journal
Jul 1 1985
Resources in parallel and concurrent systems
, ACM Press, New York, NY, 1991. Type: Book (9780897914000)
Jun 1 1992
Concurrent programming
Andrews G., Benjamin-Cummings Publ. Co., Inc., Redwood City, CA, 1991. Type: Book (9780805300864)
Jun 1 1994
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy