8 Apr 1996

Outline

Access frequency

In talking about trees we've had the implicit assumption that things tend to be accessed with equal frequency. When this is the case, then we want the search tree to be as shallow as possible, so balanced trees look very good.

Frequently, though, this isn't the case. Consider, for instance, the words in English text. It happens that some words occur much more frequently than others. If I have a tree of words, I may not hope for balance, but just hope that words like ``the'' or ``a'' fall closer to the root, and the depth of, say, ``sesquicentennial'' can be as far down the tree as it likes.

As it turns out, this happens more frequently than you might think. There's several well-known examples of this sort of skewed frequency:

Because of these skewed access frequency patterns, splay trees look like a very good idea. They guarantee good performance for uniform access distribution, and then for these more skewed access distributions they tend to do better, since more frequently accessed items tend to be closer to the root.

You may think that reorganizing the tree after each search is excessive. In fact, I did my undergraduate research on this very topic. I tested all sorts of varieties on the splay tree and found that the performance is only marginally improved by trying to improve on it.

Alpha-beta search

On last hwk we did a game tree search. We had wins, losses, and ties.

You probably had a loop that tried each possible move and recursively checked to see what result was. If you were clever, you had a small speedup: if you find a winning move, you don't need to check the rest - you can return right then.

In more complicated games like chess, you can't search all the way down. Have evalation function that returns a ``goodness'' number. Search to some depth and then return ``best'' move. E.g., say function returns ``goodness to player 1'' - so think of 1st player as ``maximizer'' and 2nd player as ``minimizer''.

                   o            MAX
                 /   \ 
               o       o        MIN
              / \     / \
             o   o   o   o      MAX
            / \ / \ / \ / \
           3  5 6 2 0 4 9 20
In this example, the left branch for MAX is worth 5, and the right branch is worth 4, so it will take the left one for a result of 5. This is called a minimax game. The number of levels of search called the ply number. The tree above, for instance, illustrates a three-ply search.

Similar kind of method can use to speedup, sort of like ``stop when you get to a win'' but a little trickier, called alpha-beta search.

Idea: say you have a tree and have figured out the following so far.

                 o        MAX    (currently we know this is worth at least 40)
               /   \
             40     o     MIN    (we know this node is worth at most 30)
                   / \
                 30   ??  MAX
Do we need to search the ``??''? No. Because no matter what we find it won't affect the outcome.

Same thing can happen in other direction:

                  o     MIN
                /   \
              40     o  MAX
                    / \
                   50  ??
Idea of alpha-beta pruning: keep track of current maximum (called alpha, or ) and current minimum (called beta, or ). For example, MIN passes its ``best for MIN so far'' () to MAX, and if MAX ever finds that its current best is , it knows it can return immediately.

This can be useful even in win/lose/tie setting. For example,

                   o          X player
                 /   \
               TIE     o      O player
                      / \
                    TIE  ??   <--don't need to evaluate.

Graphs

Basic terminology:

V
vertex set
E
edge set (a subset of V×V)
graph
a tuple (V,E)
undirected graph
a graph in which (u, v) may be in E when (v, u) is not.
n
number of vertices (|V|)
m
number of edges (|E|)
degree of a vertex v
number of edges incident to v (for directed graphs, can talk about in-degree and out-degree)

Think about the following:

A planar graph is one that you can draw on paper without any edges crossing each other. Turns out that m = O(n) for all planar graphs. This is not obvious, since some vertices might have a really high degree, but what we're saying is that it must be the case that on average their degrees are O(1). Here's why.

First, it turns out that any planar graph can be drawn on paper using straight lines (this is called a ``straight-line embedding''). Let's assume this.

Draw a picture of a straight-line planar graph. For example,

                 o---o------o
                 |\  |\      \ 
                 | \ | o---o--o
                 |  \|/      /
                 o---o------o
Now, let's try and figure out why there can't be very many edges. Here's one way to prove it:
  1. First, let's triangulate the graph. This means that we add edges in until all the interior regions become triangles. This only adds extra edges, so it is OK to do. Say we now have k triangles.
  2. Remember from geometry that in a triangle, the sum of the angles is 180 degrees. So, if we add up all of the interior angles we get
    sum of interior angles = 180k degrees
  3. Now, let's look at the vertices. Each vertex contributes AT MOST 360 degrees to the sum of the interior angles. (Interior vertices contribute 360 degrees, and exterior vertices contribute less.) So,
    k = (sum of interior angles)/180 360n/180 = 2n
  4. Each triangle has 3 edges, so
    m 3k 6n = O(n).