During a normal cache miss, the levels of the memory hierarchy closer to the processor are always checked before proceeding to subsequent levels. For example, the secondary cache is only checked if the data is not found in the primary cache. With prefetching, however, one might argue that since the prefetches are scheduled early enough to hide the worst-case miss latency, it is no longer necessary to check each level of the cache while searching for the data. To evaluate this, we modified the uniprocessor architecture such that prefetches proceed directly to memory without checking either level of the cache. The results of this experiment are shown in Figure .
As we see in Figure , it is still important to check levels of the cache close to the processor for the prefetched data. The primary reason for this is to minimize bandwidth consumption, not latency. The deeper levels of the memory hierarchy are slower, and have less bandwidth to offer. Therefore the prefetches tend to congest the memory system, causing delays both in issuing other prefetches and in servicing normal cache misses. So we see that checking the cache helps alleviate bandwidth-related delays caused by prefetches that can be serviced close to the processor (including unnecessary prefetches).
Once the prefetched data has been found, the next step is moving it close to the processor. Just how close is the next question we address.