Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Bones: an automatic skeleton-based C-to-CUDA compiler for GPUs
Nugteren C., Corporaal H. ACM Transactions on Architecture and Code Optimization11 (4):1-25,2014.Type:Article
Date Reviewed: Feb 4 2015

With heterogeneous computing environments consisting of central processing units (CPUs) and graphics processing units (GPUs) becoming more common, there is an increasing need to optimize existing sequential C source code to take advantage of these new architectures for optimized performance. Nugteren and Corporaal introduce Bones as a completely automated source-to-source compilation tool to parallelize C source code. For this paper, the output is CUDA source code for NVIDIA GPUs.

Bones will automatically examine memory access patterns in the input source code to classify an algorithm, identifying its “algorithmic species,” an extension of prior work involving algorithmic skeletons. Once a species is identified, an appropriate skeleton is invoked, and code is generated and optimized for the CUDA target. A considerable number of optimizations are performed, including data transfer optimization between the host and the accelerator and fusing accelerator kernels to reduce the number of context switches. These two optimizations are particularly important for high-performance code.

The work is experimentally validated using several metrics. Compiler optimizations are evaluated, the different output targets are compared, and the GPU-CUDA target of Bones is compared to other state-of-the-art C-to-CUDA compilers. As is the case with many compiler optimizations, performance improvements are seen for some benchmarks, while negative performance effects are seen for others. However, when comparing Bones to Par4All and PPCG, Bones generates higher-performing code for nearly all benchmarks evaluated, with an average speedup of 2.4 times compared to Par4All, and 1.4 times compared to PPCG.

Bones is neither complete nor perfect. However, this work is a promising result for a completely automatic, skeleton-based, source-to-source compilation tool. As parallel programming environments evolve, especially the CUDA application programming interface (API) and other GPU compilers, the effectiveness of tools such as Bones should improve.

Reviewer:  Chris Lupo Review #: CR143142 (1505-0395)
Bookmark and Share
 
Parallel Architectures (C.1.4 )
 
 
Compilers (D.3.4 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Parallel Architectures": Date
A chaotic asynchronous algorithm for computing the fixed point of a nonnegative matrix of unit spectral radius
Lubachevsky B., Mitra D. Journal of the ACM 33(1): 130-150, 1986. Type: Article
Jun 1 1986
iWarp
Gross T., O’Hallaron D., MIT Press, Cambridge, MA, 1998. Type: Book (9780262071833)
Nov 1 1998
Industrial strength parallel computing
Koniges A. Morgan Kaufmann Publishers Inc., San Francisco, CA,2000. Type: Divisible Book
Mar 1 2000
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy