Next: Organization of Dissertation Up: Introduction Previous: Related Work

Contributions

The primary contributions of this dissertation are the following:

The proposal of a new compiler algorithm for inserting prefetch instructions in scientific and engineering codes. This algorithm improves upon several previous proposals that focused on dense-matrix uniprocessor codes [64][42][31]. In addition, this algorithm handles indirect references, which frequently occur in sparse-matrix codes, and targets large-scale shared-memory multiprocessors as well as uniprocessors.
A detailed evaluation of the prefetching algorithm based on a full compiler implementation. The prefetching algorithm is implemented in the SUIF (Stanford University Intermediate Form) compiler, which includes many of the standard optimizations and generates code competitive with the MIPS 2.10 compiler[80]. Using this compiler system, we have been able to generate fully functional and optimized code with prefetching. By simulating the code with a detailed architectural model, we can evaluate the effect of prefetching on overall system performance. It is important to focus on the overall performance, because simple characterizations such as the miss rates alone are often misleading. The results of this evaluation show that our algorithm is quite successful at hiding memory latency, improving the performance of some applications by as much as twofold.
A study of the interaction of prefetching and other techniques for hiding latency, such as data locality optimizations, relaxed consistency models, and multithreading. We find that prefetching is complementary to both locality optimizations and relaxed consistency models, but the benefit of combining prefetching and multithreading is less clear.
An investigation of the architectural support necessary for software-controlled prefetching, including proposals that may further increase the prefetching performance benefit. In addition to including prefetch instructions in the instruction set, we find that the main support necessary for prefetching is a lockup-free cache. Further enhancements to the architecture may include hardware miss counters to expedite the use of dynamic information, and associativity to reduce the cache conflict problems.

Next: Organization of Dissertation Up: Introduction Previous: Related Work

tcm@
Sat Jun 25 15:13:04 PDT 1994