Park, Shires, and Henz look at application acceleration comparisons using field programmable gate arrays (FPGAs) and graphics processing units (GPUs). They select two applications and evaluate their performance in terms of execution speed, ease of implementation (designer productivity), and throughput.
The first application is an integer-based password hash function (EksBlowfish). The compute unified device architecture (CUDA) implementation of this application shows a 2.5-times speedup in terms of the number of keys obtained. The throughput of the EksBlowfish application is the number of keys obtained in unit time. For the FPGA implementation, the authors use hand-coded very-high-speed integrated circuit (VHSIC) hardware description language (VHDL), DIME-C, and Mitrion-C implementations. The DIME-C and Mitrion-C implementations instantiate a soft core processor that then executes the application. The VHDL implementation is expected to provide the best results, and this is proven. The VHDL implementation also shows a 2-times speedup over the CUDA implementation.
The second application is a median stack sort algorithm that finds the median of a list of numbers after sorting. The application consists of about 100K lists, each containing a maximum of 128 numbers. The authors mention that only the bubble sort execution was ported on the accelerator. The initialization and the median calculation was done on the host or the central processing unit (CPU). Results indicate a 4.8-times speedup by the CUDA implementation over the CPU.
Overall, the CUDA implementation shows the best speedup for both applications. The authors should have compared the Mitrion-C and DIME-C implementations with a MicroBlaze processor instantiated in the FPGA fabric. This paper is ideal for understanding the evaluation metrics used (designer productivity, hardware resources, development time, and software maturity) when working with application acceleration on different platforms.