Advanced semiconductor technology allows microprocessors to operate at increasing speeds. However, bottlenecks in memory bandwidth affect processor performance. In particular, in multiprocessor systems, memory access latency in the interconnection network becomes crucial in overall system performance. Therefore, complex memory control and sharing mechanisms are required to meet total bandwidth demand from both local and remote processors.
The authors of this paper propose an active memory controller (AMC) to improve performance in distributed shared-memory systems. In some specific scenarios, such as operations with low temporal locality, both interprocess communications and intranode traffic between different cache levels can be reduced by performing computations locally within the memory controller of each processing node. In this way, memory bottleneck problems can potentially be alleviated.
The proposed active memory operation (AMO) scheme has been achieved in hardware, which is separated from the processor core. It demonstrates the ability to handle memory coherence and the advantages of using localized computation. The authors present an analytical model that is used to predict the AMO performance. Dynamic random access memory (DRAM) access patterns and latencies are investigated for optimizing stream operations. However, it would be better to explore the efficiency of the deployed DRAM controller, as it affects the memory bandwidth in simulation.
This paper addresses the common issue of the processor-memory gap. The provided solution helps to mitigate this problem in several applications.