The code in Figure illustrates the types of
locality that our algorithm can discover. We will use this code as a
running example throughout the remainder of this chapter. We assume, for
this example, that the cache is 8 Kbytes, the prefetch latency is 100
cycles and the cache line size is 4 words (two double-word array elements
to each cache line). (Note that for the purpose of illustration, these
parameters differ slightly from our previous uniprocessor architecture.)
For this example, the set of references that will cause cache misses can be
determined by inspection.
For each array reference in our example code,
Figure (b) shows which iterations hit and miss in
the cache using iteration space plots. In this representation, the
horizontal axis corresponds to the j loop, the vertical axis
corresponds to the i loop, and each node represents an iteration
within the loop nest. Therefore as the loop nest is executed, the nodes in
the iteration space would be visited in the following order: first the
bottom-most row is visited from left-to-right, then the second-from-bottom
row is visited from left-to-right, and so on. To simplify the graphs, we only
show the first eight out of 100 j loop iterations.
The three array references in Figure
illustrate the three different types of locality: temporal, spatial, and
group. Temporal locality can occur when a given reference reuses
exactly the same data location. Spatial locality can occur when a
given reference accesses different data locations that fall within the same
cache line. Group Locality can occur when different references access
the same cache line.
The A[i][j] reference in Figure
illustrates spatial locality. In this case, given that each cache line
contains two array elements, we would expect misses to occur on every other
iteration as cache line boundaries are crossed. These misses are shown
graphically in Figure
(b).
The B[j+1][0] reference is an example of temporal locality. In
contrast with the A[i][j] reference, since the B[j+1][0]
reference traverses the columns rather than the rows of the matrix along
the inner loop, there is no cache line reuse and therefore all B[j+1][0] references suffer misses the first time through the j
loop. However, since the same B[j+1][0] locations are reused on
subsequent i loop iterations, and since these 800 bytes of data are
likely to remain in the 8 Kbyte cache across i loop iterations, we
would expect the B[j+1][0] references to hit in the cache after the
first i loop iteration. This effect is also illustrated in
Figure (b).
Finally, the B[j+1][0] and B[j][0] references are an example of group locality. Group locality is a relationship that can occur between multiple references whenever one reference brings in much of the data used by the other references. In this example, the data for the B[j][0] reference is fetched by the B[j+1][0] reference during the previous j iteration. Consequently the B[j][0] reference suffers only a single cache miss during the entire loop nest. Whenever group locality occurs, we only worry about prefetching the leading reference within the group, which is the reference that accesses new data first and therefore suffers the bulk of the cache misses. In this example, B[j+1][0] is the leading reference, and therefore we would not issue prefetches for B[j][0].
The remainder of this section describes how locality analysis systematically uncovers the types of locality shown in this example.