The final hardware-related issue we will discuss is whether it is useful to have a separate prefetch issue buffer in an architecture that already contains a buffer for writes, or whether both writes and prefetches should be placed in the same buffer. One possible performance disadvantage of using a combined buffer is that prefetches may be delayed behind writes. From an implementation perspective, a buffer that only handles prefetch requests would be smaller, since it does not contain written data. However, it may be simpler to build just a single buffer.
The uniprocessor architecture we have been using does not contain a write buffer, but the multiprocessor architecture does, since it has a write-through primary data cache (versus the copy-back cache of the uniprocessor architecture). In our experiments so far, the multiprocessor architecture has included both a sixteen-entry write buffer and a sixteen-entry prefetch issue buffer. To evaluate the performance impact of having a common buffer, we ran an experiment where both writes and prefetches were placed in a combined sixteen-entry buffer. Our results showed absolutely no difference in performance. This is partly because the lockup-free cache (which allows up to eight outstanding misses for the multiprocessor architecture) handles requests quickly enough that prefetches are rarely delayed behind writes. In an earlier study where we did not use a lockup-free cache [61], the performance advantage of having a separate prefetch issue buffer was also rather small. Therefore the choice of separate or combined buffers should be dictated by whichever is easier to implement, since both schemes offer similar performance.