In this subsection, we discuss an issue that arises under invalidation-based coherence schemes (the model we assume for the remainder of this section), which is the use of exclusive-mode prefetching. Under the invalidation-based coherence model, a processor wishing to read a location receives a sharable copy of the line, which allows the line to be replicated in other caches as long as each processor is only reading the line. To write to a line, however, a processor must first acquire an exclusive copy of the line by invalidating the line from other processors' caches. This prevents the replicated copies from becoming stale, thus preserving coherence.
Just as normal memory accesses have two variations (shared accesses for reads, and exclusive accesses for writes), it also makes sense to have two types of prefetches: one that fetches a shared copy of a line, and one that fetches an exclusive copy. If a processor only intends to read a line, it will use the shared-mode prefetch. However, if the processor intends to modify the line-even if the line will be read first and modified shortly thereafter-an exclusive-mode prefetch should be issued to not only fetch a copy of the line, but also to gain ownership.
Proper use of exclusive-mode prefetching can provide two performance benefits. First, it can reduce the latency of the subsequent write since exclusive ownership of the line has already been obtained. This may or may not have a direct impact on execution time, depending on whether writes can be buffered.
The second benefit occurs in the common case where a value is read before it is written. Intuitively, these cases occur frequently because it is more common to update a shared variable (e.g., incrementing a shared counter, updating the position of a particle in a wind tunnel), than to simply overwrite it without reading it first. In such ``read-modify-write'' cases, what normally occurs is that the processor first requests a sharable copy of the line, and then immediately afterward requests an exclusive copy of the same line to perform the write. Rather than issuing two separate requests, a better approach is to issue a single exclusive-mode prefetch, as illustrated in Figure . Therefore exclusive-mode prefetches can potentially eliminate up to half of the total memory traffic, which can improve the performance of all references (both reads and writes) by reducing the amount of contention in the memory subsystem.
We modify our compiler algorithm to exploit exclusive-mode prefetching as follows. After performing locality analysis, the references have been partitioned into equivalence classes (see Section ), which are sets of references that can be treated as a single reference. An equivalence class may contain multiple references if they share group locality. We insert an exclusive-mode prefetch rather than a shared-mode prefetch for a given equivalence class if at least one member of the equivalence class is a write. For example, for the code in Figure (a), locality analysis would determine that both the read and write of A[i] are in the same equivalence class. Therefore, despite the fact that the leading reference to A[i] (i.e. the reference first accessing the data) is a read, our algorithm would schedule a single exclusive-mode prefetch of A[i], thus achieving the desired effect illustrated in Figure (b).