Carnegie Mellon
SCS logo
Computer Science Department
home
syllabus
staff
schedule
lecture
projects
homeworks
 
 

15-410 HW2 solutions (Spring, 2014)


Question 1 - Public Key Practicum

If you did this problem correctly, your hw2 directory should contain a $USER.message.decrypted file. If not, check to see if there is a $USER.ERROR file. If not, a popular problem is that you used file names other than the ones specified in the assignment, the grading script wasn't able to guess what you meant, and the grader hasn't yet had the time to manually intervene.

By the way, your encrypted messages to us were read and, generally, appreciated.

The best quote about "The Tao of Programming" was provided by Eddy Yeo. The best 2-D ASCII art was provided by Rokhini Prabhu. The best 3-D ASCII art was provided by Michael L Probber. Of course we're not going to reveal the messages--they're secret!


Question 2 - "Good Fences Make Good Neighbors"

Background

This question was about memory-fence operators. These operators do their thing in a particular context: code is written by a human, compiled by a compiler, run by a processor; and data flow through a cache hierarchy. Those things are all related to this question, but this question was about controlling the ordering of events that take place in different cache lines while holding all of those other factors constant. For example:

  1. Some students were concerned that compilers sometimes reorder code. That is true! But compilers can't arbitrarily reorder code, or else nobody could ever write a working program. In addition to volatile, an important feature of the C language is sequence points, which you should know about.

  2. Some students were concerned about a CPU executing instructions out of order. This does happen! But CPUs can't execute instructions arbitrarily out of order, or else nobody could ever write a working program. CPUs are generally required to take whatever actions are necessary to control the visibility of the effects of out-of-order execution so that code running on that CPU can't tell when things run out of order. This is done through a mixture of blocking the launch of an instruction until all necessary inputs are available, along with sometimes guessing some of the inputs of an instruction to launch it, which must be coupled with invisibly cancelling all effects of a speculative excursion when a guess turns out to have been wrong. In other words, your processor may, millions of times per second, run instructions out of order, maybe based on made-up data, but the only effects that remain to be examined after the fact are effects based on true facts in order.

  3. Some students were concerned that a single cache line (such as the one holding now_serving) might be duplicated at various places in a multi-cache system, and that multiple CPUs might be reading and writing different parts of that cache line. This does happen! But multiple copies of a cache line can't be arbitrarily modified, or else nobody could ever write a working program. Multi-processor systems use a variant of the MESI protocol to bounce each cache line around the system as necessary, so that, if two CPUs write into two different words of one cache line, one of the writes will happen first, and the other write will happen second, and everybody will agree which happened first.

So what issue is this question concerned about? The issue is this: while a CPU will make sure that code run by the single thread it is executing will always see sensible orders, and while the values in a single cache line will always make sense in terms of the orders that operations on the cache line take place, there is no global agreement on the order in which effects take place inside different cache lines. If a processor has two load operations to perform, it may hand both of them off to the cache system, which may satisfy either one first. If a processor has two store operations to perform, it may hand both of them off to the cache system, which may complete either one first. Why is this ok? Because hardware architects assume that multiple CPUs are usually running code for different programs, plus most code run by multi-threaded programs accesses independent data, so then it is ok to impose a little overhead when multiple threads in a single program occasionally need ordering guarantees for shared data. How this shows up in this problem is that, usually, things protected by a lock are not in the same cache line as the lock itself.

Part A

For each of the five "Program Points" (#1 through #5) that have been labeled with comments in these two routines: Is an smp_mb() ("full barrier") operation required for this "Program Point"?

Recall that cache coherence already ensures that accesses to the same cache block will appear to all processors in some (hypothetically) serializable order. The issue with memory consistency models (and memory fences/barriers) is ordering of accesses to distinct variables (in separate cache lines).

  • Program Point #1: No, an smp_mb() operation is not required here. Among other reasons, there is already an implicit fence in get_ticket_number(), and nothing important has happened since then.

  • Program Point #2: No, an smp_mb() operation is not required here. The only real memory access in the loop test is the read of mutex->now_serving (since target is passed in as a procedure argument and would be stored in a register), and cache coherence will take care of serializability for these repeated accesses as the loop iterates. (For example, a processor should only observe monotonically increasing values, due to cache coherence.)

  • Program Point #3: Yes, an smp_mb() is required here. The key issue is not what happens simply within mutex_lock(), but the causal relationship between exiting mutex_lock() and performing the loads that are protected by this mutex (within the critical section that follows it in the application code). For example, we cannot allow the load of do_push in the example program to begin before the load of mutex->now_serving in spin_until_acquired() has been observed to be equal to target. In general, you cannot begin to perform loads after a lock acquisition until the acquisition has fully completed. Hence an smp_mb() is required.

  • Program Point #4: Yes, an smp_mb() is required here. Because the mutex is protecting the memory accesses within the critical section, we cannot allow the update of mutex->now_serving to begin until all stores within the critical section have completed. For example, the store to head_of_queue in the example program code must be completed before the update of mutex->now_serving in mutex_unlock() becomes visible. In general, you cannot begin to perform a mutex unlock before the stores prior to the unlock operation have completed. Hence an smp_mb() is required.

  • Program Point #5: No, an smp_mb() operation is not required here. A mutex_unlock() gives away permission to access data: it does not gain permission (for the releasing thread) to access data. Hence an smp_mb() is not needed at this program point.

Part B

For each of the five "Program Points" from Part A where you said that an smp_mb() operation was necessary, could you have used either an smp_rmb() or an smp_wmb() operation (but not both together) instead?

The only two program points that require smp_mb() operations are Program Point #3 and Program Point #4; hence we will consider only those program points.

  • Program Point #3: It is okay to replace the smp_mb() operation with an smp_rb() operation, since the important issue is that reads within the critical section (e.g., the read of do_push) cannot begin until after the mutex_lock() operation has successfully gained access to the lock, which does not happen until after the read of mutex->now_serving in spin_until_acquired() has been observed to be equal to target.

  • Program Point #4: It is okay to replace the smp_mb() operation with an smp_wb() operation, since the important issue is that writes within the critical section (e.g., the write to head_of_queue in the example program code, etc.) must complete before the mutex_unlock() gives away permission to other threads to access the data protected by the lock, and this will occur when the update of mutex->now_serving in mutex_unlock() becomes visible.

Note: some students might have an argument according to which one or both of the smp_mb() operations cannot be replaced. For example, if some code in the critical section does a "blind write", meaning a store operation of a value that is not computed from a value loaded from memory (queue_xxx = 3;), then it is important to make sure that store operation doesn't begin before the lock is fully acquired and an smp_mb() will be necessary. If your solution explicitly considers this unusual case, it will be graded accordingly.

For further reading...

Here is last semester's homework page.


[Last modified Sunday December 07, 2014]