There are three key behavioral distinctions between prefetches and loads; prefetches are (i) non-binding, (ii) non-blocking, and (iii) non-excepting. The non-binding property gives prefetches the flexibility to be issued far in advance of the actual references, without worrying about the impact on correctness. The non-blocking property allows prefetches to be overlapped with other references and with computation. The non-excepting property allows speculative prefetching of addresses which may potentially be invalid. In this subsection, we discuss the importance of each of these properties in more detail.
The non-binding aspect of prefetching is implemented by fetching data into the cache rather than a register. As we discussed earlier in Section , non-binding prefetches are essential in multiprocessors since they allow the compiler to prefetch a location without worrying about whether the value may have been modified by another processor in the meantime. Even in a uniprocessor, the non-binding property is important since it avoids the correctness problems that can arise when using registers for temporary storage given imperfect memory disambiguation. For example, if prefetches fetched data into registers, it would be illegal to move a prefetch ahead of a store unless it was certain that the store was to a different location (otherwise the prefetched value would be stale). Proving that addresses do not coincide is extremely difficult because of complications such as aliasing, pointers, etc. Therefore, the non-binding property frees the compiler from correctness problems that can occur both across threads and within a single thread.
An additional advantage of prefetching into the cache rather than the register file is that otherwise the limited size of the register file can be a significant constraint on how far ahead one can prefetch. This is crucial since extending register lifetimes to hundreds of cycles (in order to hide large latencies) is almost guaranteed to cause significant register spilling, which can hurt performance considerably. The register lifetime problem is most important in scientific code, where common techniques such as loop unrolling, software-pipelining and register blocking result in very high register pressures even without prefetching. The cache, on the other hand, is substantially larger than the register file, and therefore is not expected to constrain the amount one would reasonably want to prefetch ahead.
The non-blocking aspect of prefetching is essential since the very essence of this latency-hiding mechanism is overlapping memory accesses with computation. Normal loads could also potentially be non-blocking, but this would require a mechanism for interlocking and forwarding the data whenever the load result was used before the access completed. (Because of this hardware complexity, few commercial microprocessors have implemented non-blocking loads.) In contrast, it is easy to make prefetches non-blocking since they produce no result value, and therefore no instructions can depend upon their completion.
Finally, the non-excepting aspect of prefetching (i.e. prefetches do not take memory exceptions on invalid addresses) is important since it allows data-dependent addresses (e.g., pointers) to be prefetched without being absolutely certain that the address is valid. We have already discussed in Section how this is important when prefetching indirect references, such as in sparse-matrix code. Even in dense-matrix code, this property is useful by making it safe to prefetch off the end of an array whenever generating a proper epilog would be too expensive (i.e. when it would result in a code size explosion). Therefore the non-excepting property offers considerable flexibility to the compiler since it is much easier to generate valid prefetch addresses most of the time rather than all of the time.