The code in Figure illustrates the types of locality that our algorithm can discover. We will use this code as a running example throughout the remainder of this chapter. We assume, for this example, that the cache is 8 Kbytes, the prefetch latency is 100 cycles and the cache line size is 4 words (two double-word array elements to each cache line). (Note that for the purpose of illustration, these parameters differ slightly from our previous uniprocessor architecture.) For this example, the set of references that will cause cache misses can be determined by inspection.
For each array reference in our example code, Figure (b) shows which iterations hit and miss in the cache using iteration space plots. In this representation, the horizontal axis corresponds to the j loop, the vertical axis corresponds to the i loop, and each node represents an iteration within the loop nest. Therefore as the loop nest is executed, the nodes in the iteration space would be visited in the following order: first the bottom-most row is visited from left-to-right, then the second-from-bottom row is visited from left-to-right, and so on. To simplify the graphs, we only show the first eight out of 100 j loop iterations.
The three array references in Figure illustrate the three different types of locality: temporal, spatial, and group. Temporal locality can occur when a given reference reuses exactly the same data location. Spatial locality can occur when a given reference accesses different data locations that fall within the same cache line. Group Locality can occur when different references access the same cache line.
The A[i][j] reference in Figure illustrates spatial locality. In this case, given that each cache line contains two array elements, we would expect misses to occur on every other iteration as cache line boundaries are crossed. These misses are shown graphically in Figure (b).
The B[j+1][0] reference is an example of temporal locality. In contrast with the A[i][j] reference, since the B[j+1][0] reference traverses the columns rather than the rows of the matrix along the inner loop, there is no cache line reuse and therefore all B[j+1][0] references suffer misses the first time through the j loop. However, since the same B[j+1][0] locations are reused on subsequent i loop iterations, and since these 800 bytes of data are likely to remain in the 8 Kbyte cache across i loop iterations, we would expect the B[j+1][0] references to hit in the cache after the first i loop iteration. This effect is also illustrated in Figure (b).
Finally, the B[j+1][0] and B[j][0] references are an example of group locality. Group locality is a relationship that can occur between multiple references whenever one reference brings in much of the data used by the other references. In this example, the data for the B[j][0] reference is fetched by the B[j+1][0] reference during the previous j iteration. Consequently the B[j][0] reference suffers only a single cache miss during the entire loop nest. Whenever group locality occurs, we only worry about prefetching the leading reference within the group, which is the reference that accesses new data first and therefore suffers the bulk of the cache misses. In this example, B[j+1][0] is the leading reference, and therefore we would not issue prefetches for B[j][0].
The remainder of this section describes how locality analysis systematically uncovers the types of locality shown in this example.