Next: Core Compiler Algorithm Up: Introduction Previous: Contributions

Organization of Dissertation

Chapter describes our core prefetching algorithm, which handles affine array references and thus dense-matrix code. A key feature of this algorithm is minimizing prefetching overhead by only prefetching references that are predicted to suffer cache misses. This core algorithm is the basis for all of our experiments, and will be extended in later chapters.

Chapter studies the performance benefits of prefetching for uniprocessor applications, beginning with a detailed evaluation of the algorithm described in Chapter . Next we evaluate the interaction between prefetching and locality optimizations, which are another important latency-hiding technique for dense-matrix codes. Finally, we extend our core compiler algorithm to handle indirect references (and hence sparse-matrix codes), and measure the resulting performance improvement of relevant applications.

Chapter focuses on prefetching for large-scale shared-memory multiprocessors. These machines are interesting because of their large performance potential, and because they are particularly prone to suffering from memory latency. We begin by discussing how the prefetching compiler algorithm described in Chapters and is modified to address the issues unique to multiprocessing, and then evaluate its effect on the performance of the entire SPLASH [72] application suite. We also compare compiler-inserted prefetching with hand-inserted prefetching to see whether the compiler is living up to its potential, and to discover methods for further improvement.

Chapter explores the architectural issues associated with prefetching, and is divided into three distinct sections. The first section examines the architectural support necessary for the basic prefetching model assumed in Chapters and . The second part considers ways to enhance the architecture to further improve prefetching. The third section comparatively studies other latency-hiding techniques that require architectural support, namely hardware-controlled prefetching, relaxed consistency models, and multithreading.

Finally, Chapter contains a summary of the important results in this dissertation, and discusses their implications. It also discusses directions for future work in this area.

Next: Core Compiler Algorithm Up: Introduction Previous: Contributions

tcm@
Sat Jun 25 15:13:04 PDT 1994