Faced with a practical number-crunching problem, the modern practitioner must address a dilemma: Will my solution be better if I write custom software suitable for a commodity floating-point graphics processing unit (GPU) powerhouse, or should I wrap custom hardware logic with elegance and finesse around the same algorithm? This is not an easy decision: choose the former and deal with the fast, powerful, inexpensive, but also inflexible GPU hardware’s idiosyncratic instruction set; choose the latter if you prefer slower-clocked field-programmable gate array (FPGA) hardware to algorithmic requirements. In this paper, Cope et al. take a parametrized, systematic approach to this dilemma.
The classification factors are memory access requirements, arithmetic complexity, and data dependence, with each varying over limited semi-quantitative ranges. Oversimplifying the results, reconfigurable hardware logic shows eclectic adaptability to different problem categories, demonstrating competitive throughput across experimental cases that is bracketed within a relatively small range. On the other hand, commodity GPU throughput varies: it excels where the nature of the problem can exploit hardware architecture and it fails elsewhere. The paper analytically and quantitatively describes what makes commodity GPUs better or worse than FPGAs.
The paper focuses excessively on commodity GPU hardware, which is already past its prime, having succumbed to marketplace competition. Nevertheless, this work’s solid approach and method, if perhaps not all of the included details, are valid.