Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Heterogeneous computing with OpenCL : revised OpenCL 1.2 edition
Gaster B., Howes L., Kaeli D., Mistry P., Schaa D., Morgan Kaufmann Publishers Inc., Waltham, MA, 2013. 308 pp. Type: Book (978-0-124058-94-1)
Date Reviewed: Sep 27 2013

The use of discrete graphics processors for general computation has seen explosive growth across many different industries, not least in finance, where I work. Given that a single investment bank can be continuously running as many as 100,000 central processing unit (CPU) cores for its pricing and risk computations, even at 32 cores per server, this translates into a lot of physical real estate. With increasing compliance and risk management regulation requirements, increasing real estate costs, and increasing energy costs, graphical processing units (GPUs) look like the magic bullet.

Luckily, unlike many other aspects of technology, there are really only two competing standards for programming GPUs for generic computations, namely compute unified device architecture (CUDA) and OpenCL. In my organization, we have chosen the former. It is however always good practice to monitor competing standards; being a big fan of OpenGL programming, I was eager to review this book. Although OpenCL is multiplatform and device-agnostic, I should clarify that this book does lean strongly toward AMD in general and favors the Radeon 7000 series specifically. This is good to know if you want the gritty details on the latest hardware available in the market, but is less helpful if you have a legacy investment in NVIDIA hardware, for example.

Chapter 1 presents a whirlwind overview of parallel programming. Chapter 2 introduces the OpenCL standard, the execution model, the method of communication between the host and device, and the memory model, and ends with a simple source code example that sums two vectors. Chapter 3 looks at the physical device and explains superscalar execution, very long instruction word (VLIW) architectures, single instruction multiple data (SIMD) and vector processing, multicore architectures, and, importantly for GPU programming, the realities of hardware-based multithreading.

Chapter 4 guides the reader step-by-step through the process of writing and executing a simple OpenCL program. It takes code for the multiplication of a 2D matrix and converts it into a kernel for execution on a GPU. The authors then step the reader through environment setup, buffer construction, data copying, kernel compilation, and program execution. The same steps are repeated for an image rotation example and an image convolution example. Between them, these three examples highlight different memory access patterns and inter-kernel data sharing.

Having given the reader some practical interaction with OpenCL, the authors return to theory in chapter 5 with a discussion of the key OpenCL concept for breaking up a task of many “work items” into “work groups” of multiple “wave-fronts,” each executing many instances of the kernel concurrently. The kernels within a wave-front can be synchronized using barriers, while multiple operations queued to the device can be synchronized using events and command queues. This chapter ends with detailed explanations of (and a comparison between) the device and host side memory models in terms of communication, physical architecture, and speed.

Chapter 6 looks at two specific device architectures: the AMD Bulldozer CPU and the AMD Radeon HD7970 discrete graphics card. The chapter maps the logical constructs of OpenCL onto the physical components of these two devices. Chapter 7 looks into how data is transferred between the CPU and a discrete device, the impact that caching and cache coherency have on communication, and how the various channel bandwidths affect the OpenCL options for memory management. The chapter ends with an example of data reduction with timings for the various combinations available for transferring data between the host and device.

Chapters 8 through 10 cover three case studies, each with full source code. The first is an image convolution study that illustrates workgroup size selection, data caching, memory alignment, vector reads, and loop unrolling. The second is a histogram computation. This example involves a much larger input data to output data ratio, necessitating a different set of criteria for optimal workgroup sizing and memory transfer patterns. The example provides an opportunity to discuss atomic operations, hardware memory bank usage, and data scattering. The final case study is a mixed-particle simulation that illustrates joint CPU and GPU computation with the main focus on solving the load-balancing challenge between a fast CPU with a few cores and a slower GPU with many more cores.

Chapter 11 covers OpenCL extensions, additional features provided by a vendor of an OpenCL implementation that are not yet part of the standard. Here, we get to see the AMD extensions for double precision arithmetic and device fission. The latter enables the programmer to partition a physical device into multiple logical devices, each with a portion of the compute capability. Chapter 12 looks into the available tools for using OpenCL from within Java, Python, and Haskell.

Chapter 13 covers the various tools (predominantly those provided by AMD) available for profiling and debugging OpenCL code. Each tool is discussed in detail and examples on their usage are provided. Finally, in chapter 14, the reader is taken through the steps of optimizing the performance of an image analysis application.

I always enjoy reviewing later editions of a book. Not only does it imply that the content is interesting enough to warrant republishing, but it also means that many of the original errors have been corrected. In both of these aspects, this book does not disappoint. It is definitely worth the time spent reading it.

More reviews about this item: Amazon

Reviewer:  Bernard Kuc Review #: CR141590 (1312-1051)
Bookmark and Share
  Reviewer Selected
Featured Reviewer
 
 
Parallel Programming (D.1.3 ... )
 
 
Distributed Architectures (C.1.4 ... )
 
 
Graphics Processors (I.3.1 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Parallel Programming": Date
How to write parallel programs: a first course
Carriero N. (ed), Gelernter D. (ed), MIT Press, Cambridge, MA, 1990. Type: Book (9780262031714)
Jul 1 1992
Parallel computer systems
Koskela R., Simmons M., ACM Press, New York, NY, 1990. Type: Book (9780201509373)
May 1 1992
Parallel functional languages and compilers
Szymanski B. (ed), ACM Press, New York, NY, 1991. Type: Book (9780201522433)
Sep 1 1993
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy