By comparing hand-inserted prefetching with compiler-inserted prefetching, we saw that in cases where the access patterns are regular and predictable (MP3D and LU), the compiler is able to match the performance of hand-inserted prefetching. In the cases where our compiler failed to insert prefetches, the difficulty of fixing the problem ranged from challenging but straightforward to extremely difficult. For WATER, it is a matter of engineering the compiler to handle interprocedural analysis across separate files-a nontrivial task, but one that is feasible. For BARNES, the compiler needs to recognize tree structures and understand how to prefetch them. For PTHOR, however, the data structures, access patterns, control flow, and sources of misses are complicated enough that it is unclear whether the compiler can be successful. We note that these problems in PTHOR occur even in the uniprocessor version of the code, and are not unique to multiprocessing.
Strengthening the existing algorithm to handle cases such as WATER, and extending it to recognize and prefetch simple recursive data structures such as trees, as in BARNES, would appear to be the next logical steps in enhancing the compiler algorithm.