15-410 Homework 2

This homework assignment is due Friday, December 5, at 23:59:59. As we intend to make solutions available on the web site immediately thereafter, please turn your solutions in on time. Late days are not available for this assignment.

Homework must be submitted (online) in either PostScript or PDF format (not: Microsoft Word, Word Perfect, Apple Works, LaTeX, XyWrite, WordStar, etc.). A plain text file (.text or .txt) is also acceptable, though it must conform to Unix expectations, meaning lines of no more than 120 characters separated by newline characters (note that this is not the Windows convention, and MacOS has two different conventions). Except as otherwise directed (in the crypto question), turn in your answers as .../$USER/hw2/$USER.pdf, .../$USER/hw2/$USER.ps, or .../$USER/hw2/$USER.text. If you use another filename, there is some risk that your solutions will not be credited to you. Please try not to submit your homework into your book-report directory or the other way around.

As usual, you may discuss this assignment with others, but you must then go off by yourself to write up the solution.

Question 1 - Public Key Practicum

This question is not hard, but it does take a bit of time to do it right. Please don't leave this question to the very last minute, and think carefully about what the various steps accomplish.

As you go through the steps of working on this question, try to think carefully about what each step is accomplishing in terms of underlying cryptography primitives.

Follow the directions in gpg.html to generate a PGP key ring, containing public and private keys for digital signature and encryption purposes. Do not turn the key ring in to your hw2 directory! Instead, follow the directions on how to export the public key information from the key ring into a file, hw2/$USER.asc. Then create a secret message for the course staff, in hw2/$USER.secret.asc.

Question 2 - Hardware Transactional Memory

As was discussed in class, some processors have a feature called hardware transactional memory (HTM). You are doubtless familiwar with atomic instructions, such as XCHG and LOCK XADD. Each atomic instruction carries out a small list of simple operations (such as memory fetch, add, memory store) in such a way that instructions on other processors don't interfere. But hardware transactions allow a programmer to select operation sequences that weren't built into the processor at the factory and have those operation sequences carried out without interference (though some restrictions apply).

For example, consider this (potentially silly) specification:

/**
 * Atomically adds 1 to an integer value in memory, but
 * fails without effect if the resulting value would be
 * too large.
 *
 * @param p pointer to the integer value in memory
 * @param limit value that must not be exceeded
 *
 * @return 0 on success, -1 if the limit would have been exceeded
 */
extern int limited_atomic_incr(uint32_t *p, uint32_t limit)

And consider an implementation:

limited_atomic_incr:
    pushl %ebp
    movl %esp, %ebp
    movl 8(%ebp), %ecx   # p
    movl 12(%ebp), %edx  # limit

    xbegin uh_oh         # abort for any reason jumps to uh_oh
    incl (%ecx)
    cmpl (%ecx),%edx
    jbe ok_then          # ok if result is below or equal
    xabort $1
   
ok_then:
    xend 
    xorl %eax,%eax       # if we reach this instruction, the commit worked
    popl %ebp
    ret
   
uh_oh:
    # the incl inside the transaction has been rolled back
    movl $-1, %eax
    popl %ebp
    ret

The transaction does a fetch, an increment, and a store, and then does a limit check; if the limit check fails, then the store is rolled back. If the limit check succeeds, the transaction is committed, so the new value for *p is solidified for the current processor and also will be eventually sent to main memory and/or other processors. Compared to an XADD or XCHG, the amount of non-interruptable work is similar, but this approach allows many different short sequences to be run atomically, rather than just a fixed set chosen at the factory.

At a high level, the way this is implemented is something like:

When a transaction starts, a backup copy is made of all registers.
As a transaction modifies memory, the modified cache lines are "temporarily" not shared with other processors: if another processor requests a cache line we have in "modified" or "exclusive" mode, we delay answering until the transaction is over, which "shouldn't be too long".
If a transaction successfully commits, nothing needs to be done about register values or cache lines.
If a transaction aborts, all register values are reset to the backup copy, and all cache lines that are marked "modified" are switched to "invalid". This means that other processors will never see the cancelled updates. In addition, the processor that was running the transaction that aborted won't have any value for the relevant cache lines, so future references to those values will result in cache misses and fetches. Restoring the registers is fast, and instructing the cache to switch all "modified" cache lines to "invalid" is also very fast (all cache lines can do that in parallel).
The XABORT instruction is one way for a transaction to abort, but there are many others. Any exception causes a transaction to abort, but also any other "surprise", including interrupts or traps. And sometimes a transaction can abort if it changes "too many" values in memory (the processor designers refuse to say exactly how many that is, and they have their reasons, but changing "a few" memory locations is "generally fine").

The reason why register values and memory values are handled so differently is that the processor has a fixed small number of registers, so it is feasible to make a full copy, but caches are much larger than the register set, so making a complete copy of the cache would be impractical.

Part A

Unfortunately, the above description contains a bug. What could go wrong if some cache lines were "modified" before the XBEGIN instruction? Which somewhat-painful work would the XBEGIN instruction need to accomplish to avoid this problem?

Part B

Imagine your boss tells you that in addition to the two-bit state that encodes the four coherence states for each cache line (M, E, S, I), you are authorized to add one status bit to each cache line, called T (for "transaction"). Briefly explain how the T bit would be set, checked, and cleared to improve the performance of hardware transactions. For example, how would the process of modifying a cache line interact with the T bit? How would aborting a transaction interact with the T bit? How would successfully committing a transaction interact with the T bit? Note that because of how caches work it is possible for all cache lines to simultaneously set/clear/check their individual T bits.

Part C

In lecture we discussed how transactions can be implemented using "write-ahead logging" including change records and commit records, both with transaction i.d. numbers. Why doesn't the above description of hardware transactional memory have any "change records" or "commit records"?

Helpful Hint

By the way, if you think you are having AFS permission problems, try running the program located at
% /afs/cs.cmu.edu/academic/class/15410-s26/pub/access_hw2

[Last modified Wednesday December 03, 2025]