Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Impact of reverse computing on information locality in register allocation for high performance computing
Bahi M., Eisenbeis C. International Journal of Parallel Programming42 (1):49-76,2014.Type:Article
Date Reviewed: Mar 25 2014

For computers dedicated to high-performance computing (HPC) applications, computation is virtually free, while data access shows up as the main cost factor. This iconoclastic view is even more valid when looking at graphics processing units (GPUs)--thousands of cores operating on what remain somewhat limited high-speed register files still call for external, host-based memories, which are at least a thousand-fold slower. One way to address this shortcoming is to replace the storing of intermediate values with the recomputations of the same values, when evaluating instruction blocks, thus trading cheap repeated expression evaluations for costly memory spills.

The main idea introduced in this paper is that this rematerialization of values can sometimes be more favorably performed via output values in lieu of input ones, at least as long as one uses reversible operators. Hence, after a register assignment such as r2:= r1+1, one could use reversible computing to recompute, for instance, expression 2 × r1 as 2 × (r2-1), which might enable the reuse of register r1.

The paper illustrates how this concept can improve both instruction scheduling and register allocation of scientific applications, taking as use case a lattice quantum chromodynamics (LQCD) physics simulation running on an Nvidia GPU. Rematerialization via reversible computing limits register pressure beyond traditional techniques, and this proves to be beneficial in practice; double precision, run-time performance is improved up to 10 percent.

Even though still preliminary, this work addresses a key compilation issue for HPC applications, and provides new ideas that should interest compiler writers and researchers working on program transformations.

Reviewer:  P. Jouvelot Review #: CR142101 (1406-0443)
Bookmark and Share
 
Processors (D.3.4 )
 
 
General (C.1.0 )
 
 
Modes Of Computation (F.1.2 )
 
 
Systems And Software (H.3.4 )
 
Would you recommend this review?
yes
no
Other reviews under "Processors": Date
The IBM family of APL systems
Falkoff A. IBM Systems Journal 30(4): 416-432, 1991. Type: Article
Dec 1 1993
Attribute grammars: attribute evaluation methods
Engelfriet J., Cambridge University Press, New York, NY, 1984. Type: Book (9780521268431)
Jun 1 1985
Processor control flow monitoring using signatured instruction streams
Schuette M., Shen J. IEEE Transactions on Computers 36(3): 264-277, 1987. Type: Article
Dec 1 1987
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy