List of Figures
Tolerating Latency Through Software-Controlled
List of Tables
Techniques for coping with memory latency.
Hit rates of affine array accesses.
Prefetch predicates for the different types of locality.
Loop splitting transformations for the various types of locality.
Order in which the optimization passes occur in the SUIF compiler, including prefetching.
Description of uniprocessor applications.
General statistics for the uniprocessor applications. Primary data cache miss counts are for an 8 Kbyte direct-mapped cache.
Memory performance improvement for the selective prefetching algorithm.
Memory performance improvement for the indiscriminate and selective prefetching algorithms.
Average instruction overhead per prefetch of indirect reference.
Latency for various memory system operations in processor clock cycles (1 pclock = 30 ns).
Description of multiprocessor applications.
General statistics for the multiprocessor applications.
Reduction in memory stall times for the multiprocessor applications.
Statistics on exclusive-mode prefetching.
Average processor stall on a primary prefetch fill (
) and the fraction of prefetches that suffer primary cache conflicts (
) for each uniprocessor application.
Distribution of where data was found both by prefetch and by subsequent reference. ``
'' means prefetch found data at
, subsequent reference found data at
(secondary cache), and
Statistics on multithreading behavior.