Newsgroups: comp.ai.nat-lang,comp.databases.theory,comp.databases
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!news3.near.net!noc.near.net!paperboy.wellfleet.com!news-feed-1.peachnet.edu!gatech!swrinde!emory!slammer!sd!dvick
From: dvick@lanier.com (Don Vick)
Subject: Re: Fuzzy search of nat-lang sentences.
References: <JPJ.95Mar23112518@gamma.hut.fi>
Message-ID: <D5y7v5.7C0@lanier.com>
Sender: Don Vick <dvick@lanier.com>
Organization: Lanier Worldwide, Tucker, GA
Date: Fri, 24 Mar 1995 14:21:03 GMT
Lines: 32
Xref: glinda.oz.cs.cmu.edu comp.ai.nat-lang:3098 comp.databases.theory:3956 comp.databases:44278

In article <JPJ.95Mar23112518@gamma.hut.fi>,
Jukka-Pekka Juntunen <jpj@snakemail.hut.fi> wrote:
>
>What kind of indexing methods are available for fuzzy search of nat-lang text
>segments ? 
>
>Suppose I have a huge database of nat-lang sentences. Now, I want to find 
>_all_ the sentences that are close to the one searched: few words may be missing,
>word order may differ, some spelling errors etc.
>

If your are looking for sentences that are SEMANTICALLY close, there are 
techniques in ai that can help.  I'm not up on current research, but there 
was work in the 1980's on scanning newspaper articles and gathering 
relevant information.  This would have required recognition of identical 
meaning of two very different word sequences.  

>
>Are there any references available ? Any database engines to do the trick ?
>Is this possible in the relation database model ?
>

I have just started looking at the Coral deductive database system from U 
of Wisconsin (USA), from ftp.cs.wisc.edu .  It provides a language based on 
first-order predicate logic (similar to Prolog) along with relational 
database features.  Might be a suitable tool if you are interested in 
building something.  

Don
--------------------------------------------------------
Donald E. Vick  (dvick@lanier.com, dvick@crl.com)
Voice: (404) 493-2194    Fax: (404) 493-2399
