Carnegie Mellon
SCS logo
Computer Science Department
home
syllabus
staff
schedule
lecture
projects
homeworks
 
 

15-410 HW2 solutions (Spring 2017)


Question 1 - Public Key Practicum

If you did this problem correctly, your hw2 directory should contain a $USER.message.decrypted file. If not, check to see if there is a $USER.ERROR file. If not, a popular problem is that you used file names other than the ones specified in the assignment, the grading script wasn't able to guess what you meant, and the grader hasn't yet had the time to manually intervene.

By the way, your encrypted messages to us were read and, generally, appreciated.

The best Calvin and Hobbes ASCII art was provided by Marie Bremner. The best Borges quote was provided by Patricio Chilano Mateo. Once I was privileged to attend a talk given by Borges, and I appreciate being reminded of that. Of course we're not going to reveal the messages--they're secret!


Question 2 - Fencing

Part A

For the code shown above, assume that the data_memory_barrier() at "Point A" is removed. Show, using a tabular execution trace, what could go wrong, or argue convincingly that removing that fence couldn't make anything go wrong.

One problem is possible speculation of the first load in the critical section. The basic idea is that ARM processors are allowed to issue loads earlier than program order before a conditional branch is resolved. This can have the effect of moving code which is after the loop in program order so that it runs while the loop is still running. When the processor takes an action speculatively, it is required to cancel the effect if the speculation is wrong. However, when the loop exits, the final speculation will have turned out to be correct. In some sense speculation rewrites the loop so it looks like this:

x:
  speculatively { temp1 = global_tidsum; }
  if (mp->now_serving == my_ticket) {
    commit speculative load value into temp1;
  } else /* mp->now_serving != my_ticket */
    ignore speculative load and restore former value of temp1;
    goto x;
  }

In terms of instructions that get committed (have permanent results), the outcome looks like this:

  if (mp->now_serving == my_ticket) // nope
  if (mp->now_serving == my_ticket) // nope
  if (mp->now_serving == my_ticket) // nope
  if (mp->now_serving == my_ticket) // nope
  temp1 = global_tidsum;
  if (mp->now_serving == my_ticket) // yes it was!

If only one core is accessing the three cache lines holding mp‑>now_serving, my_ticket, and global_tidsum, the fact that the core fetched the value of global_tidsum "too early" is a performance bonus without a downside—good job, crazy speculative ARM core!

Or, if multiple cores are updating and accessing those three cache lines but it is not necessary for the cores to agree on the temporal ordering of the writes to the various cache lines, the speculative load is still a harmless performance bonus.

The only time the performance optimization is harmful is if multiple threads are updating and accessing the cache lines and the program will compute the wrong answer if the threads on those cores don't have a global agreement on the order in which the cache lines were written.

But this code is designed so that one thread writes global_tidsum strictly before writing mp‑>now_serving and the other thread must observe the new value of mp‑>now_serving strictly before it observes the value of global_tidsum. If the fence at "Point A" is removed, the observer thread may sample the two cache lines out of order, resulting in a trace somewhat like this:

Thread 12 Thread 13
... ...
temp1 = global_tidsum; // 0 if (mp‑>now_serving == 3) // no...
  goto x;
temp1 += 12; temp1 = global_tidsum; // 0
global_tidsum = 12;  
temp2 = 2;  
++temp2;  
mp‑>now_serving = 3  
  if (mp‑>now_serving == 3) // YES!
  temp1 += 13;
  global_tidsum = 13; // should be 25!
... ...

The fence at "Point A" prohibits the second core from launching the temp1=global_tidsum fetch until the fetch from mp‑>now_serving is done, i.e., the fence forces the second core to observe the two cache lines in the right order.

Part B

For the code shown above, assume that the data_memory_barrier() at "Point B" is removed. Show, using a tabular execution trace, what could go wrong, or argue convincingly that removing that fence couldn't make anything go wrong.

Removing the fence at "Point B" has no negative effect. What that fence does is "force" a core to delay the global_tidsum=temp1 store operation until the temp=global_tidsum fetch has completed. But the core is already forced, by the data-dependency analysis which ensures its program will observe local effects in program order, to delay the add instruction until the fetch has completed, and to delay the store until the add has completed. So the fence at "B" slows the code down but doesn't accomplish anything useful.

For further reading...

Here is last semester's homework page.


[Last modified Saturday May 06, 2017]