The STAMPede project is investigating the architectural, compiler, and OS support necessary to effectively exploit single-chip multiprocessors. "STAMPede" stands for "Single-chip, Tightly-coupled Architecture for MultiProcessing".
Thread-Level Speculation (TLS) and other similar techniques allow the compiler to automatically parallelize portions of code in the presence of statically ambiguous data dependences, thus extracting parallelism between whatever dynamic dependences actually exist at run-time. Under TLS, a program is broken into dynamic instruction sequences called epochs which the compiler believes are likely to be independent. For example, the iterations of a loop might each be an epoch if the compiler believes that cross-iteration dependences are unlikely. Associated with each epoch is a software-managed epoch number which specifies the original ordering of the epochs under sequential execution. The epochs are executed in parallel using special hardware support which uses the epoch numbers to detect whether data dependences have been violated, and which also buffers speculative side-effects until they can be safely committed to memory. If a dependence violation is detected, the hardware then notifies software of the problem, and software reacts by executing whatever recovery code is necessary to safely resume execution using the correct data.
To illustrate how TLS
works, consider the simple
while loop in the above figure,
which accesses elements in a hash table. This loop cannot be statically
parallelized due to possible data dependences through the array
hash. While it is possible that a given iteration will depend on
data produced by an immediately preceding iteration, these dependences may
in fact be infrequent if the hashing function is effective. Hence a
mechanism that could speculatively execute the loop iterations in
parallel -- while squashing and reexecuting any iterations which do suffer
dependence violations -- could potentially speed up this loop significantly,
as illustrated above. Here a read-after-write (RAW) data dependence
violation is detected between epoch 1 and epoch 4;
hence epoch 4 is squashed and restarted
to produce the correct result. This example demonstrates the basic
principles of TLS -- it can also be applied to regions of code other than
One of our target architectures in the STAMPede project is a generic single-chip multiprocessor where each processor has its own primary data cache and all processors (on the same chip) physically share a secondary cache. Our goals are to minimize the amount of new hardware that must be added to support TLS, and also to avoid degrading the performance of applications which to not utilize TLS. We propose extending the instruction set to provide new instructions which which enable software to manage TLS, to use the caches to buffer speculative state, and to extend the cache coherence scheme to detect data dependence violations.