ComputingReviews.com

The effect of sharing on the cache and bus performance of parallel programs
Eggers S., Katz R. ACM SIGARCH Computer Architecture News17(2):257-270,1989.Type:Article

Date Reviewed: 07/01/90

How does the sharing resulting from writing an application program as a set of parallel processes affect cache performance? The authors investigate this question for shared-memory multiprocessors with a single bus. They use trace-driven simulation to examine the performance of four applications written explicitly for parallel execution. The parallel programming model used is single-program-multiple-data: N processes each execute identical instructions on their own part of the shared data. This corresponds to many real-world applications written for some small number of processors, with each process dedicated to its own processor. The applications are actual CAD programs written for N = 5, 11, 12, and 12 processors. The hardware simulated is RISC-like.

The unsurprising answer is an unequivocal “it depends”-- on the sharing the application does. Applications whose processes exhibit locality (multiple consecutive writes to shared data within a cache block) behave much like nonparallel programs. Applications with fine-grain sharing (where multiple processes contend for shared data within cache blocks) do not. In either case, cache miss ratios and bus utilization are higher than in nonparallel programs because of extra misses caused by the cache invalidations necessary to maintain cache consistency. For programs with locality, this shows up as a smaller improvement in the miss ratio as cache block size or total cache size increases. For programs with fine-grain sharing, the extra misses can be sufficient to increase the miss ratio for large block or cache size. The results for bus utilization are similar.

The paper is competently organized and presented. The usual caveats apply since the model and applications used, while representative, are limited, and the traces include only application references. It would have been interesting to see how the metrics varied with the number of processes. The results will be of interest to cache designers of shared memory multiprocessors and to programmers interested enough in performance to reorganize applications to take cache parameters into account.

Reviewer: Andrew R. Huber

Review #: CR114129

Reproduction in whole or in part without permission is prohibited. Copyright 2024 ComputingReviews.com™
Terms of Use | Privacy Policy