Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Exploiting Wavefront Parallelism on Large-Scale Shared-Memory Multiprocessors
Manjikian N., Abdelrahman T. IEEE Transactions on Parallel and Distributed Systems12 (3):259-271,2001.Type:Article
Date Reviewed: Feb 1 2002

This paper addresses compiler optimizations to improve parallel performance of large-scale shared-memory multiprocessors.

Specifically, tiling and other loop transformation techniques to enhance data locality are discussed and, in particular, conflicts between parallelism and locality due to these transformations are thoroughly analyzed.

Using three scheduling strategies (dynamic self-scheduling, static cyclic and static block scheduling), as well as different tile sizes and different numbers of processors, the performance of tiling is analyzed in terms of run time overhead, synchronization requirements and locality. It is shown that there is no run time overhead for static scheduling strategies, since all assignment of tiles to processors is done a priori. For both dynamic and static cyclic scheduling, the number of synchronization counters is a function of the number of iterations and the tile size. For static block scheduling, the number of synchronization counters is a function of the number of processors. It is also shown that static block scheduling presents the best cache reuse opportunities and therefore better locality. So, static scheduling tends to enhance locality while providing sufficient parallelism for a large number of processors.

The scheduling strategies described were tested on a 32-processor HP SPP1000, using three applications: SOR, Jacobi and the LL18 kernel from Livermore Loops. Tile sizes were 32x32, 16x16 and 8x8 8-byte array elements for each application. Overall, results show that cyclic static scheduling performs better for bigger tile sizes and fewer processors, while block static scheduling gives better results for smaller tile sizes and as the number of processors increases.

Reviewer:  Veronica Lagrange Review #: CR125672 (0202-0078)
Bookmark and Share
  Featured Reviewer  
 
Parallel Architectures (C.1.4 )
 
 
Compilers (D.3.4 ... )
 
 
Shared Memory (B.3.2 ... )
 
 
Design Styles (B.3.2 )
 
 
Multiple Data Stream Architectures (Multiprocessors) (C.1.2 )
 
 
Processors (D.3.4 )
 
Would you recommend this review?
yes
no
Other reviews under "Parallel Architectures": Date
A chaotic asynchronous algorithm for computing the fixed point of a nonnegative matrix of unit spectral radius
Lubachevsky B., Mitra D. Journal of the ACM 33(1): 130-150, 1986. Type: Article
Jun 1 1986
iWarp
Gross T., O’Hallaron D., MIT Press, Cambridge, MA, 1998. Type: Book (9780262071833)
Nov 1 1998
Industrial strength parallel computing
Koniges A. Morgan Kaufmann Publishers Inc., San Francisco, CA,2000. Type: Divisible Book
Mar 1 2000
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy