Providing adequate data bandwidth in processors is a current concern in hardware design. Cache memories are used to shorten the average memory access latency. Multiported data caches can use time-division multiplexing, cache replication, and interleaving. These solutions, however, tend to be costly. This paper is concerned with the problem of providing data bandwidth in future processors. According to the authors, prediction schemes offer a more cost-effective approach.
Memory systems can use a data-decoupled architecture, where the data stream is divided into two or more independent streams, using prediction techniques, before the actual addresses are known. The partitioned memory addresses are sent to multiple-independent pipelines. The scheme uses an access region prediction table (ARPT), which is similar to a branch prediction table.
Using a cycle-by-cycle simulation produced promising results, with a 32K ARPT achieving over 99.9 percent prediction accuracy for the integer and floating-point programs tested. The ARPT required 4K of memory.
A local variable cache (LVC) was used with a group of reservation stations, called the local variable store queue. The authors suggest that improvements in local variable accesses could be possible with fast data forwarding and access combing. The list of 38 references provides the reader with many paths for further study.