Fall 2006 15-712 Second Midterm Exam Closed book. 80 minutes. Answers should be written in separate blue books. Read the questions carefully and ANSWER ALL PARTS OF EACH QUESTION. Carefully mark your name on each blue book and number the blue books if you use more than one. I will grade the first 8 answers I find in the blue books submitted. All questions have the same weight. ************* Answer 8 questions ************ 1. Optimistic methods for concurrency control (Kung81) employs much less locking than more popular concurrency control methods (such as two phase locking). Yet both achieve serializability. (a) What is serializability? (b) How does both Kung's optimistic methods and two phase locking achieve serializability? (c) What benefit does Kung81 seek, relative to two phase locking? (d) Under what circumstances are Kung's optimistic methods not a good idea? ---- (a) Serializability means that some serial ordering exists. (b) Kung achieves it by running a transaction on a private set of data, and rolling the transaction back if there was some real sharing between it and another transaction. Two phase locking blocks on a lock until entire transaction can run without any form of sharing, including false sharing. (c) Kung tries to get parallelism whenever possible. His optimistic methods avoid the fact that two phase locking would block on falsely shared variables. (d) The cost of rollbacks during high contention, or very long transactions might not make Kung's optimism worthwhile. Also a long running transaction may cause aborts on many short running transactions. ---- 2. A simple implementation of Kung81's methods might fork each transaction into its own thread, use the thread ID provided by the underlying OS for Kung81's ordering and implement its validation and writeback phase as an IPC invocation on an atomic commit_and_exit_thread(). (a) Explain TWO weaknesses Kung would see in this approach. Kung would select the serially equivalent ordering much later. (b) How can late selection of ordering hurt long running transactions? In Kung's more aggressive parallel validation schemes he sketches a scheme where he might repeatedly sample the global transaction number in order to test for conflicting read and write sets outside of the critical section. (c) Why is it safe to do these tests outside the critical section? (d) What is left to parallelize after exhausting this resampling of the global transaction number strategy? ---- (a) Long transactions may be ordered before short transactions which start after it. The short transactions will be unable to finish until the long one has. In the case of real sharing, this system may end up retrying short transactions multiple in favor of a single long one. Kung would also object to one transaction at a time executing the entire commit processing, including the testing for read and write sets against all other transactions and the write back of data associated with the write set on successfully committed transactions. He would want to overlap the disk writing of multiple transactions and even the read/write set testing wherever possible. (b) Long running transactions ordered late in a serially equivalent run have been reading and writing for a long time, while short transactions came much later and left earlier, so the long transaction is at risk for retrying because of each short transaction that came in and committed while the long transaction was running and possibly sharing data. (c) The concurrent validation comparisons are only done against transactions that have actually committed before the transaction in question, so their read and write sets will not change. After validating against all the transactions that committed before this transaction started its commit processing, it can resample the transaction ID of the most recent committed transaction to find more transactions that committed while it was validating, and validate against these too. (d) Kung's most aggressive concurrent validation involves comparing against other transactions also validating at the same time, not just those that have committed. ---- 3. The simplest B-tree access methods support atomic transformations of the B-tree. Unfortunately in a system with many concurrent transactions, this simplest method has limitations. (a) Give a limitation experienced by read-only database transactions. (b) Give a limitation experienced by read-write database transactions. Lehman81 offers more complex B-link-tree access methods to overcome these limitations. (c) How effective are B-link-trees at over coming the limitations of atomic B-tree transformations? (d) What is the key property of the implementation of B-link-trees that enables this improvement? (e) What does Lehman81 sacrifice for this improvement? ---- Atomic B-tree transformations mean that the whole tree cannot change with respect to a transaction while it is running. Simple access methods use two phase locking, and lock everything they look at until their commit. (a) No read-only transaction will even start while a write transaction is in the tree because the root might change in a write transaction. (b) No read-write transaction will start while there are any read-only transactions or another read-write transaction in the tree because it might change the root, and these others do not want the root changed while they are in the tree. (c) B-link-trees allow read-only transactions concurrently with any transaction and allow read-write transactions concurrently except when they are looking at the same node (as many as three adjacent nodes in some cases are locked relative to other read-write transactions). (d) This improvement is provided by an additional link from a node to the next node in a breadth walk of that level of the tree and careful coding so that the additional link is correct even as changes are being made in the nodes around it. (e) Because the correct node might not be found until multiple additional links are traversed, the depth properties of a B-tree are not guaranteed. Lehman's method sacrifices simplicity, b*tree balance properties, and the guarantees of log(n) tree depth. ---- 4. Consider a user-level transaction processing system in which each ongoing transaction is consuming a significant amount of the transaction processing system's virtual memory. If such a computing system begins to experience unacceptable memory pressure (more than negligible paging of the transaction processing system's virtual memory), it may signal the transaction processing system for assistance. (a, b, c) Give THREE different actions that the transaction processing system could do to (almost) immediately reduce the memory pressure in the machine. For each action, explain how that action frees up memory and how it depends on and interacts with the log. ---- (a) It could abort some running transactions, writing an ABORT record to the log, freeing up the buffers they hold because of locks, dirty data, temporary state. UNDO records may have to be processed from the log if the database does overwrite and dirty uncommitted data was pushed out to disk. (b) It could push some modified uncommitted data to the disks, after adding UNDO records to the log if the system is an overwrite database, then free up the buffers that held the dirty data. The so pushed data will not have to be written again on a successful commit, but will have to be reversed on an abort. (c) It could push some modified committed data to the disks, then free up the buffers that held the dirty data. This data is protected by REDO records in the log in case the machine fails and restarts, because these writes were logically done when the transaction committed. ---- 5. Haskin88 describes a distributed database system with a broader agenda. Specifically, Haskin88 sets out to offer transactions as a simple-to-use, nearly transparent failure recovery service for all processes in a distributed system. (a) Explain how this might benefit non-database applications. But non-database applications differ from database applications, so a variety of specializations are offered. (b, c, d) Describe THREE such specializations and how they are beneficial. ---- (a) Haskin88 offers non-database distributed applications an automated recovery method for distributed systems which are exposed to partial failures. By basing it on IPC, the recovery tools can be responsible for identifying what modules are involved in a transaction. All the application has to do is have a local commit/abort structure. There were several specific commit specializations mentioned in the paper and in lecture. Any three of these answers the question: (b) 1 phase commit services: simple volatile services which get only one message and then end. (c) Truncated 2nd phases based on vote: commit read-only if nothing to recover and no interest in second phase. (c) Transaction graph cycles: first invocation votes for real, the rest give commit read-only. (d) Transactions continuing after commit: messaging which aren't stateful or part of the last transaction are allowed after the commit starts and cannot cause an abort. (e) Coordinator migration: rotate the graph such that the most appropriate nodes become coordinator. (f) Coordinator replication: allows interposed mirrored processing for a replicated coordinator. ---- 6. Consider a set of computers communicating through a single shared ethernet cable (tapped or daisy chained), with today's fast processors and very smart network NICs. If cryptographic methods are NOT used but the number of faulty nodes is assumed to be at most 20% of the nodes on the shared ethernet, I assert that Lamport82's key result does not apply and consensus may not be possible. (a) Explain one way that my assertion does not contradict the correctness of Lamport82's key result. (b) Describe one reorganization of the communications network to overcome this assertion (it need not be particularly practical). Now return to the original network and suppose each ethernet NIC contains its own private key and a table of IP addresses on the shared ethernet cable and the corresponding public keys, and it encrypts the ethernet payload of each packet using its own private key. (c) Explain how this solves or does not solve the problems I assert defeat Lamport82's key result in this environment? ---- (a1) Shared ethernet cables can be driven from any and all of the attached NICs. With Byzantine failures, any one failed NIC can corrupt any or all packets on the ethernet, making communication unreliable and unbounded latency between any two nodes. This will defeat Lamport82's result (consensus is possible provided that the number of correct nodes is greater than 3 times the number of faulty nodes). (a2) With a shared single ethernet and no cryptographic methods, any one faulty node can send messages as any other node and the receiver cannot distinguish, so the single faulty node can make it seem that all nodes are faulty. Lamport82 assumes point-to-point, authenticated, reliable messages. (b) The simplest reorganization of the network is a NIC and link in each node connected directly to each other node. This is not practical for any but the smallest networks. If network switches are considered "trusted", then a switched ethernet network has this property, if the switch tags each packet with the incoming link id (not normally done). (c) This does not fix the problem in (a1) and has no hope of doing so. It doesn't fix the problem in (a2) but it does have a hope of doing so. Encrypting payload with your own private key signs the message provided that the other side knows which public key to try (the source's address is in the header in the clear) and the payload contains enough predictable, redundant data for a decrypted message to be recognized as distinct from garbage. ---- 7. Castro99 provides a practical Byzantine algorithm for achieving (agreement) on replication in an "asynchronous environment like the Internet". Unless Fischer85 is wrong, and it is not, asynchronous must mean something different to Castro, because Fischer85 proves that even one faulty principal makes consensus impossible in an asynchronous environment. (a) What does asynchronous mean in Castro99? (b) Why is this distinction provide a valuable system property? In fact in its "Views" algorithm Castro99 sidesteps Fischer's asynchronous problem in much the same way as Lamport82. (c) Explain how does Castro use Lamport's sidestep and why it ensures liveness? ---- (a) By asynchronous, Castro99 means that each phase doesn't have to wait for Lamport's timeout, each phase can continue as soon as enough nodes respond. (b) This is very valuable because in the normal case it allows Castro to run quite fast, since there are enough non-faulty nodes responding to end most phases in typical message latencies, while Lamport waits a full timeout for every message exchange. (c) In each view a different node is the leader in a strict round robin order. If the leading node in one view is not making progress (fast enough), other nodes may timeout and suspect the leader of being faulty. If enough nodes suspect the leader of being faulty, the view is changed so that the next node is the leader. This ensures that the amount of time that a faulty leader can block progress, threatening liveness, is bounded by timeouts, just like Lamport's sidestep, and the strict round robin ensures that a bounded fraction of the views can be led by faulty nodes. ---- 8. The introduction of processes (yes, a thread, its address space, open files, etc) provided fault isolation benefits to computing systems. Hive (Chapin95) proposes to partition the hardware and software of a multiprocessor into "cells" to provide a similar benefit of fault isolation in multiprocessor servers. (a) Explain the fault containment benefit Hive hopes to achieve with cells. (b) Give an example of hardware support Hive proposes for fault containment and explain how this hardware provides benefit. Unfortunately, Hive cannot always fully achieve this benefit. (c) How are the operating systems' core functions partitioned for fault containment, and what penalty is caused by this partitioning? ---- (a) Fault containment is achieved when an task's exposure to being stopped by faults is proportional to the fraction of the machine's resources it is using. By decomposing a large machine into cells, Hive intends to isolate the immediate effects of a fault to the cell it occurs in and the tasks that are using that cell. If a task does not use that cell, it should be unaffected by the fault, and thereby exhibit fault containment. (b) Hive allows tasks on one cell to access the memory of another cell, but worries that once faulty, a task's writes could damage any memory it has access to. So Hive has a hardware check in each memory system to be sure that a remote write operation is being done by an authorized cell. This way only tasks which have anticipated sharing memory are exposed to wild writes, and all tasks no intending to share memory will not be affected by the wild writes of a failed cell. (c) The operating system is usually a task spread over all processors in a large machine, but Hive cannot allow this, or all failed processors could wild write all memory and affect all other tasks. So in Hive, each cell has a standalone version of the OS, and the OS never uses remote shared memory -- all OS communication is through messages, which can be explicitly sanity checked on reception. Some logically OS functions, like global resource management, are pushed up to user level global tasks, which can be failed when one cell fails, but restart fast. Of course this IPC instead of shared memory strategy for the OS makes it take more cycles and time to do work, so it is less efficient. ---- 9. Anderson94 says "This is a failure of certification. Staff don't have the knowledge. Product certification should be tied to the skill of the user." (a) Explain what problem that concerns him. (b) Which of Saltzer75's design principles does this refer back to? In class, I asserted that AFS permissions had a similar problem. (c) In what way can permissions definitions exhibit this problem? ---- (a) Anderson94 is commenting on the use of sophisticated cryptographic tools in a system managed by people who don't understand these tools well. His point is that the integration and maintenance of the sophisticated tools are points of exposure -- copies of the keys, access to the inside of the ATM, freely reissuing keys on request -- so a tools is only as strong as the people managing it. That means that a product should only be certified as having specific levels of security when the end users are certified as having specific levels of skill. (b) This goes to Saltzer75's "psychological acceptability" of the human interface of a protection mechanism. (c) Permissions are a set of allowed and unallowed actions that can be directed at any user with respect to some object being acted on. The issue is whether the user assigning the permission can correctly anticipate the set of allowed and unallowed actions and if these actions can accomplish the restrictions and enablements the user hopes to achieve. If the allowed and unallowed actions are too complicated, too low-level, too dissimilar to the users' goals, it is likely that users will incorrectly assign permissions. Then even a perfect protection system will fail because the wrong protections are being implemented. ---- 10. In "The Protection of Information in Computer Systems", Saltzer75, the authors describe the following attack: "...a masquerader could exactly record the enciphered bits in one communication, and then intercept a later communication and play them back verbatim. (This technique is sometimes called spoofing.) Although the spoofer may learn nothing by this technique, he might succeed in thoroughly confusing the user or the computer system. The general countermeasure for spoofing is to include in each enciphered message something that is unique, yet predictable, such as ___________." There are three common implementation techniques for the blank in this quote. (a) Describe TWO such techniques. (b) Explain how each counters "spoofing". (c) Contrast the costs or limitations of each. ---- (a1) "just issued challenge nonce" -- a message should contain a value that the receiver recently created to be used uniquely in this message. (b1) This counters spoofing in that the receiver constructs the challenge nonces so that it never reuses the values, so no old message could be replayed and confuse it. (a2) "synchronized-clock timestamp" -- a message should contain the current clock value at a resolution small enough to never be used twice, provided that the receiver can be comfortable determining that the stamp is the current time. (b2) If the receiver's current time is exactly the message timestamp, and clock values don't repeat, then a timestamped message cannot be replayed. Since perfect synchronization is unlikely, given message skew and clock drift, a receiver has to accept stamps within a window around its clock's value, and maintain a copy of all other messages it has accepted in this window so it can detect replays in the duration of the window. (a3) "session-specific sequence number" -- inside a session with a sequence number, a challenge nonce can be computed by following a rule such as "the last challenge's value + 1", so that only the first sequence number in a series needs to be negotiated. (b3) Provided sequence numbers never wrap in a session, each message will have a new, unique, yet predictable value for the sequence number, and no old message can be replayed successfully. (c) Challenge nonces have only a small amount of state and none per message, or per session. But they often require an extra roundtrip message for the receiver to give the sender the challenge nonce. Sequence numbers derive the next challenge nonce from a prior challenge nonce using a pre-agreed algorithm, but must track state for each session using this algorithm, where there is at least one session per machine pair, and often one session per logical service pair. Timestamps also avoid the extra roundtrip to issue a challenge nonce, but instead have to remember every message within the clock skew + message skew window around the current timestamp, and compare incoming messages to each. This increase in state may be more or less than the state needed for sequence numbers, depending on the skew sizes and the number of sessions likely to be active. ---- 11. Consider the following protocol allowing principals A and B who already have private channels open to a key generating server, S, (with keys Kas and Kbs, respectively). Encryption of data d1 and d2 under key Kxy is represented as {d1, d2}_Kxy. 1. A -> B: A, Na 2. B -> S: B, {A, Na, Nb}_Kbs 3. S -> A: {B, Kab, Na, Nb}_Kas, {A, Kab}_Kbs 4. A -> B: {A, Kab}_Kbs, {Nb}_Kab (a) List at least 8 assumptions that a BAN analysis of this protocol would begin with. (b) What would be the goal of such a BAN analysis? There is at least one flaw in this protocol. (c) Describe such a flaw and explain why it is a flaw. ---- (a) A believes Na is fresh. B believes Nb is fresh. A believes Kas is a shared key between A and S. S believes Kas is a shared key between A and S. B believes Kbs is a shared key between B and S. S believes Kbs is a shared key between B and S. A believe S controls keys between A and B. B believe S controls keys between A and B. S believes Kab is a shared key between A and B. (b) The goal is for A and B to believe that Kab is a shared key between A and B. That is, this is a session key generation protocol. (c1) The flaw is that {A, Kab}_Kbs received by B contains nothing that B believes is fresh. This could have been captured in a prior exchange with the server S after which the key Kab was compromised. So BAN logic is not able to get past "S said Kab" to "S believes Kab" and from there to "B believes Kab". This would be easily fixed if S stuck Nb into {A, Nb, Kab}_Kbs in message 3. What is interesting about this example is that the flaw may be primarily BAN's, because B does believe Nb is fresh, and Nb is not revealed in plaintext as it traverses from B to S to A to B, so an attacker trying to revert the new key to a compromised old key can not have seen the value of Nb and will not be able to form {Nb}_Kab. Only S, A and B should be able to form this value, and A and S believe Kab to be a good shared key, so B should be willing to use Kab speculatively to discover Nb and derive the freshness of the message. Once this is done, BAN would allow us to associate that freshness with the rest of the message, and overcome the problem with the freshness in the first part of the message. ---- 12. The Terra platform for trusted computing in Garfinkel03 describes a procedure called Attestation that it offers for distributed system trust protocols. (a) What is attestation? (b) What properties does it seek to provide to distributed systems such as trusted gaming? (c) Give at least one limitation in the Terra realization of attestation? ---- (a) Attestation is a process whereby a remote host identifies itself to a peer by disclosing the contents of its hardware and software stack in a unforgeable way. (b) Attestation provides an integrity check of the software that was invoked for execution on a remote system. This allow nodes in a distributed system to trust that correct software was loaded on their peers. (c) Attestation is limited in that you must trust the hardware and that it can't provide a strong guarantee of what software is actually running, only what software was loaded. Load-time guarantees are vulnerable to time-of-check-to-time-of-use (TOCTTOU) vulnerabilities where software is exploited after it is checked, but before it is used. ----