This section discusses the research goals of this dissertation. At the highest level, our goal is to evaluate and improve upon the latency-hiding benefits of software-controlled prefetching. We focus specifically on compiler-inserted prefetching, since this is obviously preferable to placing the burden of prefetch insertion on the programmer. Since we are interested in practical results rather than theoretical limits, we implement our prefetching algorithms in a state-of-the-art compiler to measure the actual performance of working codes with prefetching. The goals of the compiler algorithm itself are described below.
A key goal for any compiler-based prefetching algorithm is the ability to cover a wide range of programs (e.g., dense-matrix codes, sparse-matrix codes, irregular general-purpose codes, etc.). Ideally, any application suffering from memory latency could benefit from prefetching. However, covering all types of applications and access patterns is an overly-ambitious goal, and therefore we must limit our scope to make substantial headway. Our approach is to start with an important class of applications and handle them well, and then later build upon this core algorithm to cover other important cases. The access pattern we start with is array references where the array indices are affine (i.e. linear) functions of surrounding loop indices; such access patterns occur in many types of applications, particularly dense-matrix codes. These applications are interesting both because they tend to suffer from memory latency, due to the large size of the arrays and the often wide separation between reuses, and because the access patterns are regular and predictable, which gives prefetching a good chance of success . Next we extend this algorithm to handle indirect array references, which are another important case and are common to sparse-matrix codes. Finally, we address multiprocessing, where memory latencies can be quite large. By handling all of these cases, our compiler can cover a significant fraction of scientific and engineering codes.
The second key goal of a prefetching compiler algorithm is maximizing the performance improvement for the cases that are covered. Since prefetching involves a cost as well as a benefit, care must be taken to minimize these overheads while maximizing the latency-hiding benefits to achieve the best overall performance.
In addition to these compiler-oriented research goals, our goals in the architecture domain are to determine the proper architectural support for prefetching, and to comparatively evaluate prefetching with respect to other latency-hiding techniques, such as locality optimizations, relaxed consistency models, and multithreading.