Computing Reviews

Exploiting Wavefront Parallelism on Large-Scale Shared-Memory Multiprocessors
Manjikian N., Abdelrahman T. IEEE Transactions on Parallel and Distributed Systems12(3):259-271,2001.Type:Article
Date Reviewed: 02/01/02

This paper addresses compiler optimizations to improve parallel performance of large-scale shared-memory multiprocessors.

Specifically, tiling and other loop transformation techniques to enhance data locality are discussed and, in particular, conflicts between parallelism and locality due to these transformations are thoroughly analyzed.

Using three scheduling strategies (dynamic self-scheduling, static cyclic and static block scheduling), as well as different tile sizes and different numbers of processors, the performance of tiling is analyzed in terms of run time overhead, synchronization requirements and locality. It is shown that there is no run time overhead for static scheduling strategies, since all assignment of tiles to processors is done a priori. For both dynamic and static cyclic scheduling, the number of synchronization counters is a function of the number of iterations and the tile size. For static block scheduling, the number of synchronization counters is a function of the number of processors. It is also shown that static block scheduling presents the best cache reuse opportunities and therefore better locality. So, static scheduling tends to enhance locality while providing sufficient parallelism for a large number of processors.

The scheduling strategies described were tested on a 32-processor HP SPP1000, using three applications: SOR, Jacobi and the LL18 kernel from Livermore Loops. Tile sizes were 32x32, 16x16 and 8x8 8-byte array elements for each application. Overall, results show that cyclic static scheduling performs better for bigger tile sizes and fewer processors, while block static scheduling gives better results for smaller tile sizes and as the number of processors increases.

Reviewer:  Veronica Lagrange Review #: CR125672 (0202-0078)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy