A distributed shared-memory multiprocessor system in which each node is a multiprocessor with hardware support for cache coherence is described. Nodes are connected through a local area network, and cache coherence is supported by software between the nodes. This arrangement allows the programs to take advantage of fine-grain sharing at the cache block level and coarse-grain sharing at the page level. The authors’ performance studies indicate that this system architecture is promising in most cases. For some of the programs, compile-time analysis is required to transform the program to improve data locality.
Although their experimental platform is not a true cluster of multiprocessors, the authors have made every attempt to account for the resulting inaccuracies. They provide excellent background information, making the paper suitable for those with limited knowledge in this area. Overall, their study and the proposed ideas are well suited for near-future clusters.