Figure illustrates how multithreading (also known as ``multiple-context processing'') can also be used to tolerate latency . In this figure, context #1 suffers a cache miss when it attempts to load location A. At this time, context #1 is swapped out and the processor begins to execute context #2. Hopefully by the time context #2 needs to be swapped out (which occurs when it suffers a cache miss trying to load location B), the memory access for the original context has completed, and therefore context #1 is ready to run again.
Multithreading has two advantages over software-controlled prefetching. First, it can handle arbitrarily complicated access patterns, including situations where it is impossible to predict the addresses ahead of time (and therefore prefetching will not work). Second, since it does not require any software support, it can improve the speed of existing executables without recompilation.
However, multithreading has several disadvantages relative to software-controlled prefetching. First of all, to make a single application execute faster, additional concurrent threads of execution are needed. This concurrency may or may not exist. Particularly in a uniprocessor environment, it is unlikely that a programmer would go through the pain of parallelizing their application for the sake of multithreading. A second limitation is the overhead of switching between contexts. Such overhead occurs because: (i) data cache misses are detected late in the pipeline, and subsequent instructions that have entered the pipe must be flushed; and (ii) saving and restoring context state (e.g., the register file) may take additional time. These switching overheads can potentially offset much of the performance gain of multithreading. Finally, to minimize the context switching overhead, a significant amount of hardware is required (e.g., replicated register files). Therefore multithreading is clearly a more expensive solution than prefetching, both in terms of concurrency demands and hardware support.