The goals of our compiler research were twofold: to cover a wide range of applications, and to maximize the performance benefit for the cases that are covered. In this section, we briefly discuss how our research can be extended along both of these dimensions.
The scope of our algorithm was limited to array-based scientific and engineering applications. While such applications represented an important first step for prefetching, clearly there are other types of applications and reference patterns that also deserve attention. Perhaps the most obvious next step is to address applications containing large recursive data structures, such as the trees and linked-lists that accounted for so much of the memory latency in BARNES and PTHOR. To handle such cases, the compiler will need powerful pointer analysis techniques to recognize these recursive structures and to understand the manner in which they are being traversed.
To achieve larger benefits from prefetching in the cases that are covered, a number of the techniques discussed in Section deserve further exploration. In particular, the use of dynamic information through either profiling feedback or adaptive code is likely to become more important as the compiler increases its scope to include access patterns such as pointers, where it is difficult if not impossible to predict locality based on static information alone. In addition, we have seen how chronic cache conflicts can potentially render prefetching ineffective, and that often the most desirable solution is fix the problem in software rather than hardware. Techniques that allow the software to automatically detect and prevent such conflicts would be very desirable, and would improve the performance of code even without prefetching.
Finally, this research has focused only the latency of accessing main memory. The general concept of prefetching can potentially be extended to handle other important forms of latency, such as accessing file systems and communicating across networks.