While software-controlled prefetching requires support from both hardware and software, several schemes have been proposed that are strictly hardware-based. Porterfield  evaluated several cacheline-based hardware prefetching schemes. In some cases they were quite effective at reducing miss rates, but at the same time they often increased memory traffic substantially. Lee  proposed an elaborate lookahead scheme for prefetching in a multiprocessor where all shared data is uncacheable. He found that the effectiveness of the scheme was limited by branch prediction and by synchronization. Baer and Chen  proposed a scheme that uses a history buffer to detect constant-stride access patterns. In their scheme, a ``lookahead PC'' speculatively walks through the program ahead of the normal PC using branch prediction. When the lookahead PC finds a matching stride entry in the table, it issues a prefetch. They evaluated the scheme in a memory system with a 30 cycle miss latency and found encouraging results.
To compare hardware-controlled prefetching with software-controlled prefetching, we will discuss how hardware-controlled prefetching addresses the three goals introduced in Section -namely performing analysis, maximizing effectiveness and minimizing overheads associated with prefetching.