Next: Conclusions Up: Architectural Issues Previous: Summary

Chapter Summary

In this chapter, we explored three major sets of architectural issues related to prefetching: basic architectural support for prefetching, possible enhancements to achieve even larger gains through prefetching, and alternative latency-hiding techniques that require hardware support. We now briefly summarize each section below.

The key issues in providing basic architectural support for prefetching are the following:

Unlike normal loads, prefetches are non-binding, non-blocking, and non-excepting. By giving prefetch instructions unique opcodes, the bits normally used to specify a destination register can be used instead for prefetching hints (e.g., exclusive-mode vs. shared-mode prefetches).
While dropping prefetches is a complex issue, the best approach appears to be dropping prefetches on TLB misses, and not dropping them on full prefetch issue buffers (given selective prefetching).
When performing a prefetch access, the caches should be checked while searching for the data, and the data should be placed directly in the primary cache.
The main hardware support necessary for prefetching is a lockup-free cache. Supporting up to four outstanding misses is useful, and buffering more than four outstanding prefetch requests offers only a limited advantage.

The following are the major issues in achieving even larger gains through prefetching:

Profiling feedback can potentially improve performance, but has some important drawbacks. A more attractive technique for exploiting dynamic information may be to generate adaptive code. Both of these techniques will benefit from user-visible hardware miss counters.
Providing associativity through either set-associative caches or victim caches is an important step toward dealing with cache conflicts. Neither of these solutions is as attractive as eliminating the problem in software.
To avoid excessive prefetching overhead, the compiler must be careful to avoid register spilling. To reduce overheads further, block prefetches may be useful when spatial locality exists.

Finally, prefetching compares with other latency-hiding techniques as follows:

Software-controlled prefetching appears to be superior to hardware-controlled prefetching, since the software approach results in better coverage and requires less hardware support.
Prefetching and relaxed consistency models are complementary. Relaxed consistency models eliminate write latency, and prefetching addresses the remaining read latency.
The interaction between prefetching and multithreading is complex, and appears to be complementary only with very small numbers of contexts.

Next: Conclusions Up: Architectural Issues Previous: Summary

tcm@
Sat Jun 25 15:13:04 PDT 1994