Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
A performance analysis framework for identifying potential benefits in GPGPU applications
Sim J., Dasgupta A., Kim H., Vuduc R.  PPoPP 2012 (Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, New Orleans, LA, Feb 25-29, 2012)11-22.2012.Type:Proceedings
Date Reviewed: Jan 29 2013

General-purpose graphics processing units (GPGPUs) are becoming increasingly popular as a means to accelerate various scientific kernels, particularly evidenced by their adoption into the high-performance computing community and by the integration of GPU cores into mainstream central processing units (CPUs). However, GPU performance tuning has thus far been a niche area due to the lack of tools for determining the factors that contribute to the performance of individual program components. This is in contrast to the CPU domain where many mature tools exist that do a good job of performance analysis. This paper is an excellent first step in that direction.

The authors present a performance analysis framework based on a GPU kernel that can attribute its execution time to different contributing factors. For example, it is able to separate the time spent waiting for memory access from the time spent computing results. Readers interested in performance modeling will find this approach instructive and novel.

The paper starts with the construction of a detailed analytic performance model of the GPU. The authors then use a combination of statistically determined metrics (such as instruction group sizes within a basic block) with dynamically determined performance metrics (such as instruction mix) to parameterize the model. They can then predict the effect of tuning on various optimizations, some based on algorithm changes and some that can be automatically applied (such as using the available shared memory on the GPU). They also perform a deep and introspective modeling of parallelism. For example, the authors carefully separate parallelism during memory access from parallelism during computation, taking into account both the characteristics of the application and the parameters of the underlying GPU.

This performance analysis framework would be useful to those interested in optimizing the execution of their GPU kernels. The paper is accessible to most people interested in performance modeling, although the terminology in some places is naturally GPU-centric and the testing (as presented) is limited to one specific GPU.

Reviewer:  Amitabha Roy Review #: CR140884 (1305-0386)
Bookmark and Share
 
Parallel Architectures (C.1.4 )
 
 
Modeling Techniques (C.4 ... )
 
 
Microcomputers (C.5.3 )
 
Would you recommend this review?
yes
no
Other reviews under "Parallel Architectures": Date
A chaotic asynchronous algorithm for computing the fixed point of a nonnegative matrix of unit spectral radius
Lubachevsky B., Mitra D. Journal of the ACM 33(1): 130-150, 1986. Type: Article
Jun 1 1986
iWarp
Gross T., O’Hallaron D., MIT Press, Cambridge, MA, 1998. Type: Book (9780262071833)
Nov 1 1998
Industrial strength parallel computing
Koniges A. Morgan Kaufmann Publishers Inc., San Francisco, CA,2000. Type: Divisible Book
Mar 1 2000
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy