Review Questions

[Answers are provided at the end]

1. Think of hashing 10000, 5-letter words into a table of size 10000 using the map H defined as follows. H(a0a1a2a3a4) = Σ ai (i=0,1….4) . That is, we compute the hash code by adding the letters (a=0, b=1,c=2.etc). If we use H, what would be the key distribution like?

2. What is the complexity of finding order information, such as max, min or range from a hash table?

3. A hash table needs to be resized if load factor of a table exceeds 0.7. What are the important things to do when resizing a hash table?

4. What is the appropriate probing table size if the number of items in
the hash table is 10? Assume 0.7 load factor.

5. Explain how deletion is performed in both probing and separate
chaining hash tables.

6. Given the input {4371, 1323, 6173, 4199, 4344, 9679, 1989}, a fixed
table size of 10, and a hash function H(X) = X mod 10, show the
resulting

a. Linear probing hash table

b. Quadratic probing hash table

c. Separate chaining hash table

7. What are some of the criteria’s that need to be considered when choosing a “good” hash function? Explain why it makes sense to use a power of 2 as the polynomial base b, when using the hash function H(S) = b^n-1 S₀ + b^n-2 S₁ + ….. + S_n-1

Answers:

[1] Assume we are dealing with alpha characters, any letter in the string varies in value from 0 to 25. Hence the sum of 5 letters in the string varies from 0 to 125. Considering we have 10,000 spaces in the table, this particular hash function, maps its keys to only about 1/8^th of the table. Therefore this is not a good hash function.

[2] A hash table stores no order information. Therefore in order to find, max or min, one needs to look at every element in the table. The complexity is O(n)

[3] Assume that we have a hash table of size n. When the number of occupied places in the table becomes more than 70%, we double the size of the table to 2n. But an important part of the process is to rehash each of the elements with the new table size 2n (since we need to take % 2n). Therefore elements in the new table may map to different locations than the original table. The cost of rehashing is O(n)

[4] Typically a 70% full hash table provides good performance in solving collisions when doing linear or quadratic probing. Therefore a good hash table size may be calculated by

10/0.7 = 15 (approximately)

[5] The best way to perform deletions in a hash table that is built with probing is to do “lazy deletion”. That is we mark entries as deleted, but do not remove them from the table. This allows us to still search for elements without rehashing the whole table. One drawback of this approach though is that, if there are many deletes, we leave most of the table unused, hence may need to double the size of the array frequently. However, in most practical applications deletions are rare (at least not as much as search and updates) and hence hash table is still a very good data structure.

[6] Linear Probing

Quadratic Probing

Separate Chaining

[7] The criteria’s important in choosing a good hash function are, easy to compute (low cost), and avoid collisions. It is also important to see the application and decide what a good function can be. For example, in internet security application, we may choose a function that depends on some encrypted form of the password and user ID.

The answer to second part is, when using powers of 2 as a base, it becomes easier to compute the product since we only need to shift the bits to the left by 1 (this is equivalent to multiplying by 2) subject to value not over flowing.