15-410 HW2 solutions (Fall, 2015)

Question 1 - Public Key Practicum

If you did this problem correctly, your hw2 directory should contain a $USER.message.decrypted file. If not, check to see if there is a $USER.ERROR file. If not, a popular problem is that you used file names other than the ones specified in the assignment, the grading script wasn't able to guess what you meant, and the grader hasn't yet had the time to manually intervene.

By the way, your encrypted messages to us were read and, generally, appreciated.

The most shocking message about pizza was sent by Hasan Al-Jawaheri. The most shocking message about the grading script was sent by Aliaa Essameldin. Of course we're not going to reveal the messages--they're secret!

Question 2 - "Ready?"

Observations:

When produce() and consume() are run by different threads on a single-processor machine, the code always "works" (the assertion never trips).
If produce() and consume() are run by different threads on a multi-processor machine, and NUM_X, NUM_Y, and NUM_Z have certain values, again the code always "works" (the assertion never trips).
However, for certain other values of NUM_X, NUM_Y, and NUM_Z, the assertion trips "once in a while".

Part A:

Briefly explain the facts observed above (all three of them).

While essentially all modern processors execute instructions out of order, they carefully ensure that the results of out-of-order execution are managed (via stalls and/or state roll-back) so that each individual processor runs in a "locally-sequentially-consistent" fashion. If a machine has only one processor then there is no way to observe out-of-order effects. Well, almost no way: especially on PowerPC systems, I/O devices might observe effects out of order, but that is outside the scope of this question, which is about what processors observe.
If NUM_X, NUM_Y, and NUM_Z have certain small values, then the entire struct can fit into a single cache line. Since each individual processor carefully arranges for its effects on memory to happen in sequential-consistency order, and since cache lines are transferred among processors as atomic units, once again there is no way to observe out-of-order effects.
If NUM_X, NUM_Y, and NUM_Z have certain non-small values, a single struct may be split across multiple cache lines. If that happens, because the code doesn't ensure that writes to different cache lines are ordered, multiple processors may disagree on the order in which writes happen to them.

Part B:

Explain how to use memory-barrier instructions to resolve the assertion failures described above.

Insert a barrier at "Code location 'B'" and "Code location 'F'".

The barrier at "B" is necessary to delay the store into ready (which is used to synchronize between threads) until the store into val (the thing we are trying to synchronize about) has been committed to the memory system. It is not necessarily true that all other caches have the new value of val (some of them don't care about the new value), but it is true that the "sending" CPU has done what is necessary to revoke stale copies from other caches.

The barrier at "F" is necessary to delay the launch of the fetch of val until a fetch of ready has returned a non-zero value. Without that barrier, the processor could issue the fetch of ready, look ahead at the instruction stream and compute that it must stall the issue of the compare instruction because the fetch isn't done, look further ahead at the instruction stream and see a fetch of val, and launch the fetch of val. This would be "legal" because the processor has no reason to assume that two fetches from different memory locations depend on each other. They do depend on each other, but only because ready is being used as a synchronizer. Because we decided to use ready as a synchronizer we must insert a barrier to make that work.

Warning: If your answer discussed how a call to barrier() was necessary to "push" a value out, or to "pull" a value in, please go back and review the material. "Pushing" and "pulling" of specific values happen as a result of the cache-coherence protocol: the processors send each other messages about the contents of particular cache lines when they launch fetch or store operations. The purpose of fence/barrier operations is to delay certain fetches or stores with respect to other fetches or stores.

For further reading...

Here is last semester's homework page.

[Last modified Saturday December 12, 2015]