In this chapter, we evaluate the performance benefits of prefetching for array-based uniprocessor applications. Section describes the experimental framework used throughout this chapter, including our architectural assumptions, benchmarks, compile-time parameters, and simulation environment. The results of these experiments are presented in four major subsections. First, Section contains a detailed evaluation of the algorithm described in the previous chapter for prefetching affine array references. We observe that each component of this core compiler algorithm is effective at achieving its goal, thereby improving overall execution time by as much as twofold. Second, Section evaluates the robustness of this algorithm by varying the compile-time parameters that are determined heuristically rather than precisely for a specific architecture (i.e. the effective cache size, the target memory latency, and the policy on unknown loop bounds). The results show that these parameter variations affect only a small subset of the applications, and the performance impact in those cases is generally small; therefore the algorithm appears to be robust. Third, having already examined prefetching in isolation, Section evaluates the interaction between prefetching and another powerful latency-hiding technique for dense-matrix codes: locality optimizations. The results illustrate that prefetching and locality optimizations are complementary, and therefore should be combined. Finally, having focused thus far only on affine array references, we extend our core algorithm in Section to handle indirect references, which allows us to prefetch sparse-matrix codes. The results demonstrate that this relatively straightforward extension improves performance by as much as an additional 20%. Finally, we conclude the chapter in Section with a summary of the important results.