Next: Organization of Dissertation
Up:
Introduction
Previous: Related Work
The primary contributions of this dissertation are the following:
- The proposal of a new compiler algorithm for inserting prefetch
instructions in scientific and engineering codes. This algorithm improves
upon several previous proposals that focused on dense-matrix uniprocessor
codes [64][42][31]. In addition,
this algorithm handles indirect references, which frequently occur in
sparse-matrix codes, and targets large-scale shared-memory
multiprocessors as well as uniprocessors.
- A detailed evaluation of the prefetching algorithm based on a
full compiler implementation. The prefetching algorithm is implemented
in the SUIF (Stanford University Intermediate Form) compiler, which
includes many of the standard optimizations and generates code
competitive with the MIPS 2.10 compiler[80]. Using this compiler
system, we have been able to generate fully functional and optimized code
with prefetching. By simulating the code with a detailed architectural
model, we can evaluate the effect of prefetching on overall system
performance. It is important to focus on the overall performance,
because simple characterizations such as the miss rates alone are often
misleading. The results of this evaluation show that our algorithm is
quite successful at hiding memory latency, improving the
performance of some applications by as much as twofold.
- A study of the interaction of prefetching and other techniques for
hiding latency, such as data locality optimizations, relaxed consistency
models, and multithreading. We find that prefetching is complementary
to both locality optimizations and relaxed consistency models, but the
benefit of combining prefetching and multithreading is less clear.
- An investigation of the architectural support necessary for
software-controlled prefetching, including proposals that may further
increase the prefetching performance benefit. In addition to
including prefetch instructions in the instruction set, we find that
the main support necessary for prefetching is a lockup-free cache.
Further enhancements to the architecture may include hardware miss
counters to expedite the use of dynamic information, and associativity
to reduce the cache conflict problems.
Next: Organization of Dissertation
Up:
Introduction
Previous: Related Work