Prefetching data the proper amount of time in advance of when it is used is tricky for the hardware, since it has difficulty knowing where the instruction stream will be a hundred or so cycles in the future. Whereas the compiler uses software pipelining, the hardware must rely on branch prediction and possibly instruction lookahead buffering in order to issue the prefetches at the right time. This can be quite difficult, for example, when a loop contains a conditional statement whose outcome varies erratically. For example, if the outcome of function foo() in Figure is unpredictable, the lookahead mechanism will be ineffective since it must back up and start over each time the branch is mispredicted. With software-pipelining, however, the data would still be prefetched properly, since the compiler realizes that subsequent loop iterations are executed regardless of the outcome of the conditional statement.
The second challenge of making prefetches effective is avoiding cache conflicts, which is a problem common to both hardware-based and software-based techniques. Therefore the techniques discussed earlier in Section are also applicable to hardware-controlled prefetching.