Finally, for our experiments in Section , we set the prefetch latency to 300 cycles. We chose a value greater than 75 cycles to account for bandwidth-related delays. To evaluate whether this value was a good choice, we compiled each benchmark again using prefetch latencies of 100 and 1000 cycles. In nearly all the cases, the impact on performance is small. In many cases, the 100-cycle case is slightly worse than the 300-cycle case due to bandwidth-related delays. The most interesting case is CHOLSKY, as shown in Figure (c). In this case, prefetched data tends to be replaced from the cache shortly after it arrives, so ideally it should arrive ``just in time''. Therefore, the lowest prefetch latency (100 cycles) offers the best the performance, as we see in Figure (c). However, in such cases the best approach may be to eliminate the cache conflicts that cause this behavior .
In general, we observe that it is better to be conservative with the prefetch latency parameter. Clearly if the value is not large enough to hide latency, it will always hurt performance. If we specify more latency than is actually experienced, it hurts performance only if data gets displaced. As caches become larger, this should become less and less of a problem. Besides which, only a relatively small number of new lines can be fetched into the cache in 300-500 cycles. If cache conflicts are a problem within this relatively small window of time, chances are that the conflicts will occur even if the prefetch latency is set to the smallest value that can hide the latency. These chronic cache conflicts must be dealt with in another way, as we will discuss later in Section .