- experiments.txt
- mean_bucket.rb
- largest_bucket.rb
- my_hash_function.rb (challenge)

Place these files in a `lab7` folder. Before leaving lab,
zip up the `lab7` folder
and hand the zip file in.

This lab uses facilities provided by RubyLabs.
If you were not present for Lab 4 (with the `time` operation)
or Lab 6 (with the `Canvas`), you will need to setup RubyLabs before beginning this lab.

- Review ASCII encoding and string indexing in Ruby 1.8.
- Hash Functions

The correct operation of a hash table relies on a hash function that maps keys (in our case, these are strings) to non-negative integers that indicate which bucket the key belongs to. The hash table will work as long as the hash function maps each key to a non-negative integer less than the size of the hash table, but it will only perform well if it distributes keys relatively uniformly across buckets.

Consider the following hash functions:

Perhaps, the simplest possible hash function is one that just ignores the

`key`and always returns 0:def h0(key,ht_size) return 0 end

Another hash function obtains the ASCII code of the first character in the key, divides that code by the size of the hash table, and returns the remainder of that division. This can be coded in Ruby as:

def h1(key,ht_size) return key[0] % ht_size end

Another hash function totals up the ASCII codes of all of the characters in the key, divides that total by the size of the hash table, and returns the remainder of that division:

def h2(key,ht_size) x = 0 for i in 0..key.length-1 do ascii_code = key[i] x = x + ascii_code end return x % ht_size end

This scheme examines all the letters in the string, but is not sensitive to order. So the strings "act", "cat", and "tac" will all hash to the same value.It is also possible to build a hash function that is position sensitive, based on the idea of treating each character of the string as a digit in a number. The most familiar way of representing a number is to use a radix-10 system with the Arabic numerals: "0", "1", "2", "3", "4", "5" "6", "7", "8", "9". So, for example, "52473" means \[((((((5*10+2)*10)+4)*10)+7)*10)+3.\] One can, however, also represent numbers using other systems with different sets of symbols. For example, one could represent numbers using the letters "a" - "z" for the numbers 0 through 25 to obtain the Radix-26 system described in your textbook. Characters in the computer are often stored using 7-bit ASCII codes (with values 0-127). So we can treat a string of ASCII characters as a Radix-128 number.

To make this into a Ruby hash function, one need only convert the string into a Ruby integer and then take the remainder of division by the size of the hash table:

def h3(key,ht_size) x = 0 for i in 0..key.length-1 do ascii_code = key[i] x = 128 * x + ascii_code end return x % ht_size end

Hash Functions

For each of the hash functions,

`h0`,`h1`,`h2`, and`h3`. What is the result of hashing the following strings for a table of size 100,000.- "Hash table"
- "Table hash"
- "Towers of Hanoi"

Would you get any collisions (when a key is placed in a bucket that already has a key) if you were to insert these keys into hash tables using each of these hash functions?

Record the results in

`experiments.txt`.Hash Table Statistics

Our hash tables are implemented as Ruby arrays where each element is a bucket. That bucket is an array that holds the entries that hashed to that bucket's index. In order to evaluate the effectiveness of different hash functions it is useful to be able to gather some statistics about the hash table after we've inserted a bunch of entries.

Define a Ruby function

`largest_bucket(ht)`in`largest_bucket.rb`that returns the maximum number of entries contained within any single bucket of the hash table`ht`. You can follow this algorithm:- Set
*max_so_far*to 0. - For each element,
*bucket*, in the hash table, do the following:- If the length of
*bucket*is greater than*max_so_far*, then:- Set
*max_so_far*to the length of*bucket*

- Set

- If the length of
- Return
*max_so_far*.

Example:

>> largest_bucket([["a","b"],[],["w","x","y","z"],["c"]]) => 4

- Set
Define a Ruby function

`mean_bucket(ht)`in`mean_bucket.rb`that returns the arithmetic mean of the number of entries in non-empty buckets of the hash table`ht`. You can follow this algorithm:- Set
*nonempty_buckets*to 0. - Set
*entries*to 0. - For each element,
*bucket*, in the hash table, do the following:- If the length of
*bucket*is greater than 0.- Add one to
*nonempty_buckets*. - Add the length of
*bucket*to*entries*.

- Add one to

- If the length of
- Return \(\frac{\textit{entries}}{\textit{nonempty_buckets}}\).

Example:

>> mean_bucket([["a","b"],[],["w","x","y","z"],["c"]]) => 2.33333333333333

- Set

Hash Table Experiments

Download and save the code from hash_table.rb into your

`lab7`folder. Look at the code.This code implements hash table creation (

`new_hash_table`), insertion (`ht_insert!`), search (`ht_search?`), and the four hash functions (`h0`,`h1`,`h2`,`h3`). It also includes a "main" function called`run(hash_fun)`that sets up a hash table, inserts a couple hundred thousand words, and searches for a couple hundred times. It also measure the elapsed time for these operations, and reports some statistics about the buckets (using your functions`largest_bucket`and`mean_bucket`functions).Load your functions and

`hash_table.rb`into irb, and call`run("h0")`,`run("h1")`,`run("h2")`, and`run("h3")`, which will perform the timing and statistical measurements for hash tables with the respective hash function. In interpreting the results, take note of the fact that run() inserts hundreds of thousands of words but only searches for 100.Record the output of these runs in

`experiments.txt`. How well do the different hash functions perform? What correlation do you see between how evenly the hash function distributes keys across buckets and the performance of the hash functions?Challenge: Devise a Better Hash Function

Design your own hash function. Modify the

`run`function in order to experiment on your hash function. How does your function compare with the four described above? Try to beat`h0, h1, h2,`and`h3`.Save your hash function in

`my_hash_function.rb`. Include the results of your experimental evaluation of your hash function in`experiments.txt`