Carnegie Mellon

Computer Science Department |
 |
 |
 |
 |
 |
 |
 |
| |
|
|
15-410 HW2 solutions (Fall, 2015)
Question 1 - Public Key Practicum
If you did this problem correctly,
your hw2 directory
should contain a $USER.message.decrypted file.
If not, check to see if there is a
$USER.ERROR file.
If not, a popular problem is that you used file names
other than the ones specified in the assignment,
the grading script wasn't able to guess what you meant,
and the grader hasn't yet had the time to manually
intervene.
By the way, your encrypted messages to us were
read and, generally, appreciated.
The most shocking message about pizza was sent by
Hasan Al-Jawaheri.
The most shocking message about the grading script was
sent by
Aliaa Essameldin.
Of course we're not going to reveal the messages--they're
secret!
Question 2 - "Ready?"
Observations:
When produce() and consume() are run by
different threads on a single-processor machine, the code always
"works" (the assertion never trips).
If produce() and consume() are run by
different threads on a multi-processor machine,
and NUM_X, NUM_Y, and NUM_Z have
certain values,
again the code always "works" (the assertion never trips).
However, for certain other values of
NUM_X, NUM_Y, and NUM_Z,
the assertion trips "once in a while".
Part A:
Briefly explain the facts observed above (all three of them).
While essentially all modern processors execute instructions
out of order,
they carefully ensure that the results of out-of-order execution
are managed (via stalls and/or state roll-back) so that
each individual processor runs in a "locally-sequentially-consistent"
fashion.
If a machine has only one processor then there is no way to
observe out-of-order effects.
Well, almost no way:
especially on PowerPC systems, I/O devices might observe
effects out of order, but that is outside the scope of this
question, which is about what processors observe.
If
NUM_X, NUM_Y, and NUM_Z have
certain small values,
then the entire struct can fit into a single cache line.
Since each individual processor carefully arranges for its
effects on memory to happen in sequential-consistency order,
and since
cache lines are transferred among processors as atomic units,
once again there is no way to observe out-of-order effects.
If
NUM_X, NUM_Y, and NUM_Z have
certain non-small values,
a single struct may be split across multiple cache lines.
If that happens, because the code doesn't ensure that writes
to different cache lines are ordered,
multiple processors may disagree on the order in which writes happen
to them.
Part B:
Explain how to use memory-barrier instructions to resolve the
assertion failures described above.
Insert a barrier at "Code location 'B'" and "Code location 'F'".
The barrier at "B" is necessary to delay the store into
ready (which is used to synchronize between threads)
until the store into val (the thing we are trying to
synchronize about)
has been committed to the memory system.
It is not necessarily true that all other caches have the new value
of val (some of them don't care about the new value),
but it is true that the "sending" CPU has done what is
necessary to revoke stale copies from other caches.
The barrier at "F" is necessary to delay the launch of the fetch
of val until a fetch of ready has returned
a non-zero value.
Without that barrier, the processor could issue the fetch of
ready, look ahead at the instruction stream and
compute that it must stall the issue of the compare instruction
because the fetch isn't done,
look further ahead at the instruction stream and see a fetch
of val,
and launch the fetch of val.
This would be "legal" because the processor has no reason
to assume that two fetches from different memory locations
depend on each other.
They do depend on each other, but only because
ready is being used as a synchronizer.
Because we decided to use ready as a synchronizer
we must insert a barrier to make that work.
Warning:
If your answer discussed how a call to barrier()
was necessary to "push" a value out, or to "pull" a value
in, please go back and review the material.
"Pushing" and "pulling" of specific values happen as a result
of the cache-coherence protocol:
the processors send each other messages
about the contents of particular cache lines when they
launch fetch or store operations.
The purpose of fence/barrier operations is to delay
certain fetches or stores with respect to other fetches or stores.
For further reading...
Here is last semester's homework page.
|