Computing Reviews

GPU concurrency:weak behaviours and programming assumptions
Alglave J., Batty M., Donaldson A., Gopalakrishnan G., Ketema J., Poetzl D., Sorensen T., Wickerson J.  ASPLOS 2015 (Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems, Istanbul, Turkey, Mar 14-18, 2015)577-591,2015.Type:Proceedings
Date Reviewed: 07/16/15

A memory consistency model (MCM) is a specification that describes the value(s) that a memory location should hold based on the causal history of operations that may or may not be associated with that location. The MCM specification is important because the correctness of a program depends on knowing what a thread reads/writes to a memory location in the presence of concurrent accesses by other threads. This specification exists at multiple levels: between a programmer and a compiler for a language, between a compiler and the processor/hardware on which it runs, between multiple hardware components, and so on. Considering this sandwich of specifications and the number of optimizations that many compilers/hardware do today, writing deterministic and performant parallel programs is a challenge for an expert or beginner programmer.

Efforts to thoroughly understand and find bugs in MCMs have spanned multiple decades [1,2,3]. This paper plays a crucial role in understanding one such specification: the one between a programmer and the general-purpose graphics processing unit (GPGPU) hardware.

In the paper, the authors describe that the observed consistency model on the GPU hardware is that of weak behavior, that is, a relaxed consistency model where operations can be reordered during execution leading to a nonsequentially consistent execution unless appropriate steps are taken such as using fences. They justify this stand by building a set of tests--litmus tests--that help capture this behavior. They then suggest fixes to obtain the required behavior when using such code snippets. They also produce a tool, optcheck, which can inspect GPU assembly code for reorderings that might change the behavior of the litmus tests. Finally, they present a formal model that can help build a simulation tool, proposed not implemented, which would predict possible behaviors of PTX fragments.

I do have a few gripes about the paper. The authors do not indicate whether any out-of-thin-air reads were observed in the CoRR experiments. Furthermore, they justify the necessity of fences in Figure 6, but an argument about sufficiency is absent. Overall, the paper is very well written and easily understandable. Of particular mention is the way the authors defend the lack of clarity in vendor specifications via excerpts from such specifications that contradict each other. Finally, Table 2 depicting the behaviors identified by their study is a must-read for any GPU programmer.


1)

Adve, S. V.; Gharachorloo, K. Shared memory consistency models: a tutorial. IEEE Computer 29, 12(1996), 66–76.


2)

Manson, J.; Pugh, W.; Adve, S. V. The Java memory model. In Proc. of POPL '05. ACM, 2005, 378–391.


3)

Adve, S. V.; Boehm, H.-J. Memory models: a case for rethinking parallel languages and hardware. CACM 53, 8(2010), 90–101.

Reviewer:  Karthik Murthy Review #: CR143621 (1510-0884)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy