Computing Reviews, the leading online review service for computing literature.

Search

Exploiting Wavefront Parallelism on Large-Scale Shared-Memory Multiprocessors
Manjikian N., Abdelrahman T. IEEE Transactions on Parallel and Distributed Systems12 (3):259-271,2001.Type:Article

Date Reviewed: Feb 1 2002

This paper addresses compiler optimizations to improve parallel performance of large-scale shared-memory multiprocessors. Specifically, tiling and other loop transformation techniques to enhance data locality are discussed and, in particular, conflicts between parallelism and locality due to these transformations are thoroughly analyzed. Using three scheduling strategies (dynamic self-scheduling, static cyclic and static block scheduling), as well as different tile sizes and different numbers of processors, the performance of tiling is analyzed in terms of run time overhead, synchronization requirements and locality. It is shown that there is no run time overhead for static scheduling strategies, since all assignment of tiles to processors is done a priori. For both dynamic and static cyclic scheduling, the number of synchronization counters is a function of the number of iterations and the tile size. For static block scheduling, the number of synchronization counters is a function of the number of processors. It is also shown that static block scheduling presents the best cache reuse opportunities and therefore better locality. So, static scheduling tends to enhance locality while providing sufficient parallelism for a large number of processors. The scheduling strategies described were tested on a 32-processor HP SPP1000, using three applications: SOR, Jacobi and the LL18 kernel from Livermore Loops. Tile sizes were 32x32, 16x16 and 8x8 8-byte array elements for each application. Overall, results show that cyclic static scheduling performs better for bigger tile sizes and fewer processors, while block static scheduling gives better results for smaller tile sizes and as the number of processors increases.

Reviewer: Veronica Lagrange	Review #: CR125672 (0202-0078)

Parallel Architectures (C.1.4 )

Compilers (D.3.4 ... )

Shared Memory (B.3.2 ... )

Design Styles (B.3.2 )

Multiple Data Stream Architectures (Multiprocessors) (C.1.2 )

Processors (D.3.4 )

Would you recommend this review?

yes

Other reviews under "Parallel Architectures":	Date

A chaotic asynchronous algorithm for computing the fixed point of a nonnegative matrix of unit spectral radius Lubachevsky B., Mitra D. Journal of the ACM 33(1): 130-150, 1986. Type: Article	Jun 1 1986

iWarp Gross T., O’Hallaron D., MIT Press, Cambridge, MA, 1998. Type: Book (9780262071833)	Nov 1 1998

Industrial strength parallel computing Koniges A. Morgan Kaufmann Publishers Inc., San Francisco, CA,2000. Type: Divisible Book	Mar 1 2000

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy