Prefetching for Multiprocessors
Prefetching for Uniprocessors
Previous: Experimental Results
Our study of compiler-based prefetching for array-based uniprocessor
applications has produced the following results:
- The selective prefetching algorithm presented in
Chapter is successful at hiding memory
latency while minimizing prefetching overhead, thus improving overall
performance by as much as twofold.
- Our prefetching algorithm is robust with respect to the
compile-time parameters that describe the memory hierarchy.
- Prefetching and locality optimizations are complementary and
therefore should be combined. Locality optimizations reduce the number
of accesses to main memory, and prefetching tolerates the latency of the
- Through a minor extension of our software pipelining
algorithm, our compiler can automatically prefetch indirect array