Cache simulation tools can provide valuable information for use in tuning applications for better memory performance. The overhead of a complete cache simulation can be substantial enough to inhibit its frequent use, however.
The authors describe a technique to improve the performance of these tools. They do this by simulating only subsections of the complete reference trace. By sampling one-tenth of the full trace, they improve the performance of their MemSpy tool by a factor of four to six on five benchmark applications.
They perform experiments showing how the accuracy of the simulation depends on several parameters, including cache size, sample length, and number of samples. The use of sampling is most accurate on programs with high cache miss rates, exactly the situation where a performance tool is most needed. Because the state of the cache is unknown at the beginning of each sample, larger caches lead to higher uncertainty in estimating cache miss ratios. In their benchmarks, using cache sizes of 16KB to 128KB, the sampling technique leads to estimates of cache miss ratios with an absolute error of no more than 0.3 percent. These measurements show that the sampling technique is most useful for programs with high cache miss rates or many references, and lead to a set of recommended strategies for memory performance tuning depending on the characteristics of the application.