8 Apr 1996

Outline

Access frequencies
Alpha-beta search
Graphs

Access frequency

In talking about trees we've had the implicit assumption that things tend to be accessed with equal frequency. When this is the case, then we want the search tree to be as shallow as possible, so balanced trees look very good.

Frequently, though, this isn't the case. Consider, for instance, the words in English text. It happens that some words occur much more frequently than others. If I have a tree of words, I may not hope for balance, but just hope that words like ``the'' or ``a'' fall closer to the root, and the depth of, say, ``sesquicentennial'' can be as far down the tree as it likes.

As it turns out, this happens more frequently than you might think. There's several well-known examples of this sort of skewed frequency:

Word frequency - it happens that words tend to occur roughly in proportion to the inverse of their place order; ie, f(i) is proportional to 1/i, where i is the ith most frequent word. This is called ``Zipf's distribution'', after the person who first observed this.
Database usage - many databases tend to be very loaded toward the top. Banks, for instance, have some patrons who are very active, while they have many of whom they almost never hear. As a general rule of thumb, database researchers think of the 20% most accessed records as being accessed 80% of the time - this is the ``80-20 rule''.
Program usage - programs tend to have small ``critical'' sections which occupy much of the time. For example, in the Connect 4 Tournament I was concerned about efficiency and so ran a profiler on my code to see where I should spend time making code more efficient: it turned out that 95% of the time was spent in evaluating board states! After some toying, I got that down to 90%.

Because of these skewed access frequency patterns, splay trees look like a very good idea. They guarantee good performance for uniform access distribution, and then for these more skewed access distributions they tend to do better, since more frequently accessed items tend to be closer to the root.

You may think that reorganizing the tree after each search is excessive. In fact, I did my undergraduate research on this very topic. I tested all sorts of varieties on the splay tree and found that the performance is only marginally improved by trying to improve on it.

Alpha-beta search

On last hwk we did a game tree search. We had wins, losses, and ties.

You probably had a loop that tried each possible move and recursively checked to see what result was. If you were clever, you had a small speedup: if you find a winning move, you don't need to check the rest - you can return right then.

In more complicated games like chess, you can't search all the way down. Have evalation function that returns a ``goodness'' number. Search to some depth and then return ``best'' move. E.g., say function returns ``goodness to player 1'' - so think of 1st player as ``maximizer'' and 2nd player as ``minimizer''.

                   o            MAX
                 /   \ 
               o       o        MIN
              / \     / \
             o   o   o   o      MAX
            / \ / \ / \ / \
           3  5 6 2 0 4 9 20

In this example, the left branch for MAX is worth 5, and the right branch is worth 4, so it will take the left one for a result of 5. This is called a minimax game. The number of levels of search called the ply number. The tree above, for instance, illustrates a three-ply search.

Similar kind of method can use to speedup, sort of like ``stop when you get to a win'' but a little trickier, called alpha-beta search.

Idea: say you have a tree and have figured out the following so far.

                 o        MAX    (currently we know this is worth at least 40)
               /   \
             40     o     MIN    (we know this node is worth at most 30)
                   / \
                 30   ??  MAX

Do we need to search the ``??''? No. Because no matter what we find it won't affect the outcome.

Same thing can happen in other direction:

                  o     MIN
                /   \
              40     o  MAX
                    / \
                   50  ??

Idea of alpha-beta pruning: keep track of current maximum (called alpha, or

) and current minimum (called beta, or

). For example, MIN passes its ``best for MIN so far'' (

) to MAX, and if MAX ever finds that its current best is

, it knows it can return immediately.

This can be useful even in win/lose/tie setting. For example,

                   o          X player
                 /   \
               TIE     o      O player
                      / \
                    TIE  ??   <--don't need to evaluate.

Graphs

Basic terminology:

V: vertex set
E: edge set (a subset of V×V)
graph: a tuple (V,E)
undirected graph: a graph in which (u, v) may be in E when (v, u) is not.
n: number of vertices (|V|)
m: number of edges (|E|)
degree of a vertex v: number of edges incident to v (for directed graphs, can talk about in-degree and out-degree)

Think about the following:

In an undirected graph with n vertices, what is the maximum number of edges? Try drawing pictures of complete graphs with three, four, or five vertices.
Say you have a graph on n vertices that represents a city map, and each intersection involves at most 4 streets (i.e., each vertex has degree at most 4). What is an upper bound on the number of edges (street segments)?

Answer: m 2n. One way to see it: total vertex degree (add up over all vertices) is at most 4n. Each edge contributes 2 to total degree.

(For example, a map of Pittsburgh has about 15000 intersections, and about 25000 street segments.)

For this reason an adjacency list is better than an adjacency matrix to represent city maps.

A planar graph is one that you can draw on paper without any edges crossing each other. Turns out that m = O(n) for all planar graphs. This is not obvious, since some vertices might have a really high degree, but what we're saying is that it must be the case that on average their degrees are O(1). Here's why.

First, it turns out that any planar graph can be drawn on paper using straight lines (this is called a ``straight-line embedding''). Let's assume this.

Draw a picture of a straight-line planar graph. For example,

                 o---o------o
                 |\  |\      \ 
                 | \ | o---o--o
                 |  \|/      /
                 o---o------o

Now, let's try and figure out why there can't be very many edges. Here's one way to prove it:

First, let's triangulate the graph. This means that we add edges in until all the interior regions become triangles. This only adds extra edges, so it is OK to do. Say we now have k triangles.
Remember from geometry that in a triangle, the sum of the angles is 180 degrees. So, if we add up all of the interior angles we get
sum of interior angles = 180k degrees
Now, let's look at the vertices. Each vertex contributes AT MOST 360 degrees to the sum of the interior angles. (Interior vertices contribute 360 degrees, and exterior vertices contribute less.) So,
k = (sum of interior angles)/180 360n/180 = 2n
Each triangle has 3 edges, so
m 3k 6n = O(n).