1. Think of hashing 10000, 5-letter words into a table of size 10000 using the map H defined as follows. H(a0a1a2a3a4) = Σ ai (i=0,1….4) . That is, we compute the hash code by adding the letters (a=0, b=1,c=2.etc). If we use H, what would be the key distribution like?
2. What is the complexity of finding order information, such as max, min or range from a hash table?
3.
A hash
table needs to be resized if load factor of a table exceeds 0.7. What are the
important things to do when resizing a hash table?
4.
What is
the appropriate probing table size if the number of items in
the hash table is 10? Assume 0.7 load factor.
5.
Explain
how deletion is performed in both probing and separate
chaining hash tables.
6.
Given the
input {4371, 1323, 6173, 4199, 4344, 9679, 1989}, a fixed
table size of 10, and a hash function H(X) = X mod 10,
show the
resulting
a.
Linear
probing hash table
b.
Quadratic
probing hash table
c.
Separate
chaining hash table
7.
What are
some of the criteria’s that need to be considered when choosing a “good” hash
function? Explain why it makes sense to use a power of 2 as the polynomial base
b, when using the hash function H(S) = bn-1 S0 + bn-2
S1 + ….. + Sn-1
Answers:
[1] Assume we are dealing with alpha characters,
any letter in the string varies in value from 0 to 25. Hence the sum of 5
letters in the string varies from 0 to 125. Considering we have 10,000 spaces
in the table, this particular hash function, maps its keys to only about 1/8th
of the table. Therefore this is not a good hash function.
[2] A hash table stores no order
information. Therefore in order to find, max or min, one needs to look at every
element in the table. The complexity is O(n)
[3] Assume that we have a hash
table of size n. When the number of occupied places in the table becomes more
than 70%, we double the size of the table to 2n. But an important part of the
process is to rehash each of the elements with the new table size 2n (since we
need to take % 2n). Therefore elements in the new table may map to different
locations than the original table. The cost of rehashing is O(n)
[4] Typically a 70% full hash table
provides good performance in solving collisions when doing linear or quadratic
probing. Therefore a good hash table size may be calculated by
10/0.7 = 15 (approximately)
[5] The best way to perform deletions in a
hash table that is built with probing is to do “lazy deletion”. That is we mark
entries as deleted, but do not remove them from the table. This allows us to
still search for elements without rehashing the whole table. One drawback of
this approach though is that, if there are many deletes, we leave most of the
table unused, hence may need to double the size of the array frequently.
However, in most practical applications deletions are rare (at least not as
much as search and updates) and hence hash table is still a very good data
structure.
[6] Linear Probing
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Quadratic Probing
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Separate Chaining
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
[7] The criteria’s important in
choosing a good hash function are, easy to compute (low cost), and avoid
collisions. It is also important to see the application and decide what a good
function can be. For example, in internet security application, we may choose a
function that depends on some encrypted form of the password and user ID.
The answer to second part is, when
using powers of 2 as a base, it becomes easier to compute the product since we
only need to shift the bits to the left by 1 (this is equivalent to multiplying
by 2) subject to value not over flowing.