Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Data prefetch mechanisms
Vanderwiel S., Lilja D. ACM Computing Surveys32 (2):174-199,2000.Type:Article
Date Reviewed: May 1 2001

The growing gap between microprocessor and memory performance has made cache prefetching a key research issue in recent years. Vanderwiel and Lilja survey data prefetch mechanisms for both single-processor and multiprocessor architectures. The first part of this work gives the necessary background for people who want to study cache prefetching. For its completeness and clarity, this part is strongly recommended to researchers who are beginning their work on cache prefetching. The introduction outlines the ideas underlying prefetching methods. The authors also point out the drawbacks of an incorrect prefetching policy, including cache pollution and unnecessary prefetches.

The second part of the paper describes the known efficient and effective prefetch mechanisms. Single-processor architecture is considered first, and the approaches are classified into software and hardware data prefetching. Software data prefetching is mainly based on the manual or pre-compiler-driven inclusion of special instructions that force the cache to fetch a given datum at a given time. The drawbacks of this approach include difficulties in prefetch scheduling, increased register pressure, and degradation of the instruction cache performance (due to the extra fetch instructions). The authors subdivide hardware data prefetching into sequential prefetching (commonly known as one-block lookahead) and its variants, and prefetching with arbitrary strides. A clear and understandable overview of the latter approach is given. Published solutions that combine the advantages of both these approaches are also reported. Finally, the additional problems implied by multiprocessor architectures are listed and discussed.

This paper is useful, well-structured, and clear. It covers most of the topics in cache prefetching, and gives helpful suggestions about what must be prefetched into the cache, when, and where. Nevertheless, some recent research results are not included in this survey, mainly because of their later publication dates. The advantage of using different prefetching methods depending on the locality [1, 2], and the improvement to applications that demand a lot of data (such as multimedia applications) when using specialized prefetching methods [3, 4], are not mentioned and probably were outside the scope of this work. Finally, the paper does not discuss suitable metrics for cache performance evaluation, either in terms of reducing cache misses or in terms of time saved.

Reviewer:  Andrea Prati Review #: CR125143
1) Milutinovic, V.; Tomasevic, M.; Markovi, B.; and Tremblay, M. A new cache architecture concept: the split temporal/spatial cache. In Proceedings of the Eighth Mediterranean Electrotechnical Conference, MELECON ’96, vol. 2, 1108–1111.
2) Gonzales, A.; Aliagas, C.; and Valero, M. A data cache with multiple caching strategies tuned to different types of locality. In Proceedings of the Ninth ACM International Conference on Supercomputing (July 1995), ACM, New York, 1995, 338–347.
3) Zucker, D. F.; Lee, R. B.; and Flynn, M. J. Hardware and software cache prefetching techniques for MPEG benchmarks. IEEE Trans. Circuits Syst. Video Technol. 10, 5 (Aug. 2000), 782–796.
4) Cucchiara, R.; Piccardi, M.; and Prati, A. Temporal analysis of cache prefetching strategies for multimedia applications. In Proceedings of the 20th IEEE International Performance, Computing and Communications Conference, IPCCC 2001 (Phoenix, AZ, April 2001), IEEE, New York, 2001, 311–318.
Bookmark and Share
 
Cache Memories (B.3.2 ... )
 
 
Memory Structures (B.3 )
 
Would you recommend this review?
yes
no
Other reviews under "Cache Memories": Date
The effects of processor architecture on instruction memory traffic
Mitchell C., Flynn M. ACM Transactions on Computer Systems 8(3): 230-250, 2000. Type: Article
Oct 1 1991
Efficient sparse matrix factorization on high performance workstations--exploiting the memory hierarchy
Rothberg E., Gupta A. ACM Transactions on Mathematical Software 17(3): 313-334, 1991. Type: Article
Dec 1 1992
Cache behavior of combinator graph reduction
Philip J. J. (ed), Lee P. (ed), Siewiorek D. (ed) ACM Transactions on Programming Languages and Systems 14(2): 265-297, 1992. Type: Article
Feb 1 1993
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy