[2-2] When should I use a hash table instead of an association list?

Both association lists (alists) and hash tables may be used to
represent tabular data. Hash tables have an O(1) running time and
alists an O(n) running time, so hash tables are ultimately more
efficient than alists. However, if the alists are small, they can be
more efficient than hash tables, which have a large initial overhead.

Alists can sometimes be more efficient if the keys are sorted
according to frequency, with the most heavily accessed keys appearing
at the front of the list. But one doesn't always know this kind of
information, and even then the frequency distribution may be flat.

In Allegro CL 4.1 [SPARC; R1], the rule of thumb is that for less than
24 elements, linear search using alists beats hashing.  In Lucid CL
4.0.1 HP 9000/700, the break-even point is at 10 elements. The
break-even points vary in other lisps from as low as 4 elements to as
high as 100 elements. So if you're using alists in your code, using 
hash tables instead may speed up your program. 

A potential problem may occur, however, when the keys of an EQ or EQL
hash table are Lisp objects such as conses or arrays (or other objects
that are identified by their addresses). In most implementations, such
tables must be re-hashed after garbage collection. If your application
causes frequent GCs, this can adversely affect the performance of hash
table lookup. Since EQL-hashing and =-hashing of fixnums generally
don't require rehashing after GC, one way of avoiding this problem is
to include a unique identifier in each key object and hash on that
instead. Another solution is to use an EQUAL hash table if the keys
are conses or an EQUALP hash table if the keys are arrays or other
(non-circular!) structures.
Go Back Up

Go To Previous

Go To Next