In this section we discuss some important concepts that are useful when thinking about prefetching compiler algorithms.
First of all, prefetches are only possible if the memory addresses can be determined ahead of time. For example, if the address is dependent on data that is only available immediately before the memory reference, it may not be possible to compute the address far enough in advance. While this is an important concern with irregular computations, such as those containing pointers and linked lists, it is not an issue for the dense-matrix codes considered in this chapter, since the affine array addresses can always be computed ahead of time.
Given that prefetching is possible, a useful metric for determining success is the coverage factor, which is the fraction of original cache misses (i.e. without prefetching) that have been prefetched. Ideally, we would like to prefetch all cache misses, thus achieving a coverage factor of 100%.
However, it is not necessarily the case that ``more is better'' with prefetching, since prefetches issued for data already in the cache result in overhead without any benefit. Such prefetches are referred to as unnecessary prefetches, and should be avoided.
Finally, even if a prefetch is issued for data that is not already in the cache, it may not be effective at improving performance if the data is not found in the cache during the subsequent memory reference. This can happen for two reasons. Obviously if the prefetch is issued too late, there simply isn't enough time to hide the memory latency. But on the other hand, if the prefetch arrives in the cache too early, it may be displaced by other references before it has a chance to be referenced. Having discussed these high-level prefetching concepts, we now focus on the compiler algorithm itself.
Any compiler algorithm for inserting prefetches can be viewed as having two distinct phases. The first phase is an analysis phase, where the compiler is trying to answer the question: what exactly should be prefetched? The ideal answer to this question is precisely the set of dynamic references that suffer cache misses. By answering this question correctly, the compiler will maximize the coverage factor and minimize unnecessary prefetches. Once the compiler knows what it wants to prefetch, the second phase is to schedule those prefetches so that they are effective and so that they add a minimum amount of instruction overhead.
Minimizing overhead is important, because although the benefit of prefetching is hiding latency, prefetches can introduce two forms of overhead. First, there are additional instructions to issue the prefetches. This includes not only the prefetch instructions themselves, but also any instructions needed to generate the prefetch addresses. Second, any prefetches that do not eliminate cache misses, either because they are unnecessary or ineffective, will increase the load on the memory subsystem,, which can lead to increased queueing delays for all references.
For the dense-matrix applications shown earlier in Figure , we find that while the miss rate is high enough to substantially affect performance, the hit rate of the array references is still very high (over 60%for most programs), as shown in Table . (Details of this uniprocessor architecture were given earlier in Section .) Therefore if we were to prefetch all of the array references, then over 60%of the prefetches would be unnecessary for most of the programs. Avoiding these unnecessary prefetches is an important goal of our compiler algorithm, which we describe in the next several sections.