Today we'll review for the quiz tomorrow. The quiz will be thirty minutes long, and you'll be permitted an open book and a sheet of notes.
Other topics that are fair game for quiz: mergesort, quicksort, recurrences.
Radix sort is a good method for sorting strings. If n strings of length L each, it takes O(n*L) time.
Example (all students in class last year with first names beginning with "E"):
eb3z eg27 ek2u ek35 ec3h ep2u ey27 ec2q ef2g ef2q ek00 el28 es58
If we do most-significant-first (MSF) radix sort, then we put the ids into buckets based on most significant character, then recursively sort each bucket that has more than one element, starting at next-most significant character. Finally, we concatenate the buckets. We start with one bucket. After the second scatter, we have:
ek35
ec3h ef2g ek2u
eb3z ec2q ef2q eg27 ek00 el28 ep2u es58 ey27
And now we recursively sort the second, third, and fifth buckets.
For least-significant-first (LSF) radix sort, we sort stably (remember what stable sorting means?) on the rightmost character:
ek00 ek35 eg27 ey27 el28 es58 ef2g ec3h ec2q ef2q ek2u ep2u eb3z
(Let's say that digits are less than characters.) Then we move to the
left. After step 2:
ek00 eg27 ey27 el28 ef2g ec2q ef2q ek2u ep2u ek35 ec3h eb3z es58
And after step 3:
eb3z ec2q ec3h ef2g ef2q eg27 ek00 ek2u ek35 el28 ep2u es58 ey27
(Trace ef2g and ef2q. Notice they get in correct order in 1st
step, and then STAY in correct order since the sort is STABLE)
What are the advantages to LSF? There is no recursion. Just a big ``for'' loop. Note: "radix sort" usually refers to LSF
What are the advantages of MSF? Sometimes you will get lucky and buckets get down to size 1 before you're done reading all the string (just like how strcmp sometimes is lucky and doesn't have to read down entire string to check that the two are different)
We can do an example if people want. Try this out: Start with array
2 1 4 3 6 5 8 7 9Now turn this into heap, and do the first couple steps of the second phase of heapsort. (You can try graphical program in: /afs/andrew/scs/cs/15-211/handouts/heap_demo.dec)
Quicksort and heapsort are both good ideas if, say, you wanted a really fast way to find the top 100 numbers in some huge array:
To insert key, we compute i = h(key). If A[i] is empty, store there. Otherwise, i = (i+1) % M, and repeat.
To search: apply hash function. get index i. search A[i], A[i+1], etc, with wraparound, until either find key or else find empty spot.
Here's an example when M is 7 and h is the following:
x A B C D E F
h(x) 4 4 5 0 6 6
Say we insert A through E in order. Then the hash table would read:
0 1 2 3 4 5 6
D E A B C
Search for B: start at h(B) and go up until find it or empty spot.
Or to search for F, we look at 6,0,1,2. Since 2 is empty, we return
``not found.''
Idea: have 2nd hash function h2. h2(key) tells what increment should be. Say h(key) = i. h2(key) = j. Look in i, i+j, i+2j, i+3j, ... (mod M)
The function h2 usually really simple, like the first letter of string. But make sure it's not ever zero. (Why?) It should be relatively prime to M. (Again, why?)
Say we have the example before, but h2 is as follows:
x A B C D E F
h2(x) 1 2 3 4 5 6
Now we insert A through E in order to get:
0 1 2 3 4 5 6
D E A C B
Business of being relatively prime to M is that if it is relatively
prime, you can show that will eventually try every spot. Eg, case of
'E', first tried 6, 4, 2. If 2 was filled would have tried 0, 5, 3,
1. If not relatively prime, then the insertion will skip at least half
of the table's positions.
Avoiding relatively prime numbers is in general difficult, but if we intelligently select a prime hash table size, it quickly becomes much easier.