With heterogeneous computing environments consisting of central processing units (CPUs) and graphics processing units (GPUs) becoming more common, there is an increasing need to optimize existing sequential C source code to take advantage of these new architectures for optimized performance. Nugteren and Corporaal introduce Bones as a completely automated source-to-source compilation tool to parallelize C source code. For this paper, the output is CUDA source code for NVIDIA GPUs.
Bones will automatically examine memory access patterns in the input source code to classify an algorithm, identifying its “algorithmic species,” an extension of prior work involving algorithmic skeletons. Once a species is identified, an appropriate skeleton is invoked, and code is generated and optimized for the CUDA target. A considerable number of optimizations are performed, including data transfer optimization between the host and the accelerator and fusing accelerator kernels to reduce the number of context switches. These two optimizations are particularly important for high-performance code.
The work is experimentally validated using several metrics. Compiler optimizations are evaluated, the different output targets are compared, and the GPU-CUDA target of Bones is compared to other state-of-the-art C-to-CUDA compilers. As is the case with many compiler optimizations, performance improvements are seen for some benchmarks, while negative performance effects are seen for others. However, when comparing Bones to Par4All and PPCG, Bones generates higher-performing code for nearly all benchmarks evaluated, with an average speedup of 2.4 times compared to Par4All, and 1.4 times compared to PPCG.
Bones is neither complete nor perfect. However, this work is a promising result for a completely automatic, skeleton-based, source-to-source compilation tool. As parallel programming environments evolve, especially the CUDA application programming interface (API) and other GPU compilers, the effectiveness of tools such as Bones should improve.