Newsgroups: comp.lang.c++,comp.lang.c,comp.ai,comp.ai.fuzzy
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!news.mathworks.com!news.alpha.net!uwm.edu!math.ohio-state.edu!howland.reston.ans.net!news.starnet.net!wupost!news.utdallas.edu!corpgate!bcarh189.bnr.ca!nott!cunews!Carleton.CA!Brian_Sullivan
From: Brian_Sullivan@Carleton.CA (Brian Sullivan)
Subject: Re: name matching algorithms
Message-ID: <Brian_Sullivan.97.2F797275@Carleton.CA>
Sender: news@cunews.carleton.ca (News Administrator)
Organization: Carleton.CA
X-Newsreader: Trumpet for Windows [Version 1.0 Rev A]
References:  <3la79m$d39@interport.net>
Date: Wed, 29 Mar 1995 14:45:09 GMT
Lines: 51
Xref: glinda.oz.cs.cmu.edu comp.lang.c++:120320 comp.lang.c:133389 comp.ai:28607 comp.ai.fuzzy:4336

In article <3la79m$d39@interport.net> raiff@interport.net (Jonathan Raiff) writes:
>From: raiff@interport.net (Jonathan Raiff)
>Subject: name matching algorithms
>Date: 28 Mar 1995 18:49:10 -0500

>Hi.  I need to implement a set of routines that will allow me to compare two
>names (or more generically, alphanumeric strings) for "equality".  These names
>are based on hand entered data, so there will be an assortment of differences 
>between two names that are really the same.

>For instance, "Jonathan Raiff" and "Jonathan Riaff" and "Jonathan Raif" when
>compared using strcmp() would not find these strings identical.  I am looking
>for some routines that will allow me to selectively set the tolerance for
>"sameness" and then test for it.

>If anyone has done some work in this area, or better yet has code they are 
>willing to share, please e-mail me.  Thanks...


>Jonathan Raiff

Use to do this all the time, when I worked for a Junk Mailer.

If it's names you are looking for, use the soundex code, map the names to 
their soundex values and compare thoes.

A few extra things, names can be entered:
    
     first initial last
     last, first initial
     first last
     last first

    Break down to words. 
    if three words and exists 
    if three words and one is 1 charword with trailing '.' 
    or three words and one is 1 char
           is initial throw it away
   if three words  and trailing ',' on first
          throw away word three
  if three words and two are 1 char
          second 1 char is inital, throw it away.
   if still three words
         throw away middle word
   make soundex (word1word2) compare for match
   make soundex (word2word1) compare for match

If you are using entire address, use the postal/zipcode+soundex.

I have a sample of a simple soundex in basic if you want it.
      
