RECITATION NOTES 15-451 Algorithms 09/19/07 - hand back hwk - problem from last year's final exam - additional topic chosen by TA ======================================================================= 1. hand back hwk. 2. Here is a problem from last year's final exam, related to the recent material. [reworded a bit for this recitation notes] Suppose we want to design a binary search tree over {1,...,n} to minimize the worst-case time to look up any element. Then clearly a balanced tree is best. But, what if we are allowed to use a randomized construction (equivalently, a probability distribution over search trees). Perhaps we can do better in terms of minimizing the worst-case *expected* time of any lookup. In this problem, we consider the case of n=3. It turns out that over {1,2,3}, there are only three interesting trees to consider: (a) the balanced tree with 2 at the root, (b) the zig-zag tree with 1 at the root, then 3, then 2, and (c) the zag-zig tree with 3, then 1, then 2. [it's not totally obvious that you don't need to use the other trees like 1-2-3 or 3-2-1, but you don't] a) Fill in the entries for a 3x3 matrix game corresponding to this problem. We have one row for each tree, one column for each possible element to be looked up, and the cells in the matrix should indicate the cost to perform the given access in the given tree (assume the cost of an access is the depth of the element in the tree, with the root having depth 1). b) Before getting to the main question, what are tight *deterministic* upper and lower bounds? Ans: 2. The balanced tree has depth at most 2 and there's no tree with depth 1. From the point of view of the matrix, we are saying that there is some row in which all entries are at most 2 (upper bound of 2), and for every row there is some entry that is at least 2 (lower bound of 2). c) Now for the main question: what are tight *randomized* upper and lower bounds? In particular, a the value of a randomized strategy for the row player gives us an upper bound, and the value of a randomized strategy for the column player gives us a lower bound. The best possible upper bound is the (value of the) minimax optimal strategy for the row player, and the best possible lower bound is the (value of the) minimax optimal strategy for the column player. These will turn out to match. Let's solve for the row player, and you can use the hint that it is possible to argue by symmetry that the probability on the zag-zig tree should equal the probability on the zig-zag tree. Let p = probability on zig-zag = prob on zag-zig. So the balanced tree has probability 1-2p. We want to minimize the worst-case (maximum) of cost of looking up 1: 2(1-2p) + p + 2p cost of looking up 2: (1-2p) + 3p + 3p cost of looking up 3: 2(1-2p) + 2p + p. The first and last of these are identical, so we can simplify this to: we want to minimize the maximum of: 2 - p and 1 + 4p. Notice that as functions of p, one increases with p and one decreases with p (and one is larger when p=0 and the other is larger when p=1). So, the maximum will be smallest when both are equal. I.e., 2 - p = 1 + 4p so p = 1/5. So, the minimax optimal strategy is to use the balanced tree with probability 3/5, and each other tree with probability 1/5. The *value* of this strategy is 9/5. So, it's a little better than just using the balanced tree. Actually, our argument shows that this is optimal, so this implies this is also a lower bound (otherwise it wouldn't be optimal!) but if we want we can also explicitly give a minimax-optimal strategy for the column player. In other words, what distribution on inputs is the worst one for any tree algorithm? In this case, it turns out we can assume the probability on 1 and 3 are equal. So, let's call it q, and have the probability on 2 be 1-2q. We now want to maximize the minimum of: average cost on balanced tree: 4q + (1-2q) average cost on zig-zag tree: 3q + 3(1-2q) average cost on zag-zig tree: 3q + 3(1-2q) So, we want to maximize the minimum of 1 + 2q and 3 - 3q. Again, the minimum is maximized when both are equal, so 5q=2, q=2/5. So, the worst distribution for the inputs is 2/5 prob on 1, 2/5 prob on 3, and 1/5 prob on 2. ======================================== 3. [insert any additional material here. E.g., could go over mini]