Newsgroups: comp.compilers,comp.theory,comp.lang.misc,comp.lang.dylan
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!spdcc!iecc!compilers-sender
From: haahr@netcom.com (Paul Haahr)
Subject: Re: Looking for high-quality hashing function for dynamic linker
Message-ID: <95-08-056@comp.compilers>
Keywords: linker, design
Sender: compilers-sender@chico.iecc.com
Organization: NETCOM On-line services
References: <95-08-021@comp.compilers>
Date: Thu, 3 Aug 1995 08:45:33 GMT
Approved: compilers@chico.iecc.com
Lines: 47
Xref: glinda.oz.cs.cmu.edu comp.compilers:9227 comp.theory:13273 comp.lang.misc:22534 comp.lang.dylan:4997

Eric Benson <eb@kaleida.com> wrote:
> Is it feasible to compute a signature consisting of a few (say, less than
> ten) 32 bit words and use this for comparison instead of the entire set of
> names?

Yes.  Or, rather, if you trust your network and its checksums, it's
probably reasonable to trust checksums for this purpose.

> A false positive comparison would probably result in a severe
> malfunction so it should be extremely unlikely, say one-in-a-billion or
> less.  I can spend a big chunk of CPU time in the compilation environment
> to compute this signature if necessary.  I have a 29-bit CRC hash function
> already available.  Is this, or some multi-word variant of it, good enough
> for my purposes?

I would suspect that two 32-bit CRCs using different polynomials would
not be 2^32 times stronger than one 32-bit checksum, so multiple CRCs
may not buy you as much as you would hope.

> [A good 32 bit hash function should give you a 1 in 4 billion chance of a
> false match. -John]

Yes.  But, to be utterly paranoid, you might want to pick one of the
cryptographic checksums, such as MD5 or snefru, rather than CRC.  For
some definition of better, they may be better.

> If so, does anyone have a larger-than-29-bit CRC
> implementation?  Does anyone have any references to relevant research?

My favorite reference on the topic is Gerard J. Holzmann, _Design and
Validation of Computer Protocols_, Prentice-Hall 1991.  He gives a
general, fast implementation of a table-driven CRC by Don Mitchell.

A while ago, I wrapped up that CRC as a Unix program, suitable for using
as a replacement to sum.  It's at:

  ftp://ftp.webcom.com/pub/haahr/src/crc.c

It should support polynomials up to the word length of your machine.
The code for generating the table is separate from the CRC code, so you
just have to pick a good polynomial.  See Gerard's book for details on
how to do that.

Paul
-- 
Send compilers articles to compilers@iecc.com,
meta-mail to compilers-request@iecc.com.
