Newsgroups: comp.ai.nat-lang
Path: cantaloupe.srv.cs.cmu.edu!das-news.harvard.edu!news2.near.net!MathWorks.Com!europa.eng.gtefsd.com!howland.reston.ans.net!gatech!rutgers!ucla-cs!oahu.cs.ucla.edu!jperry
From: jperry@oahu.cs.ucla.edu (John Perry)
Subject: Re: Books on Intro. natural language proces
Message-ID: <1994Sep22.202518.1804@cs.ucla.edu>
Sender: usenet@cs.ucla.edu (Mr Usenet)
Nntp-Posting-Host: oahu.cs.ucla.edu
Organization: UCLA, Computer Science Department
Date: Thu, 22 Sep 94 20:25:18 GMT
Lines: 51

ted@crl.nmsu.edu (Ted Dunning) writes:
>
>it should be remembered that the most successful natural language
>processing programs around today are information retrieval systems.

OK, I'll agree with that.

>
>these generally do *no* parsing.

BZZZZZZT.  Wrong answer.  Thank you for playing.  Very few of these
systems are based exclusively on the distribution of occurrence of
individual words.  Any analysis of the interaction of words, whether
it be mere n-gram co-occurrence analysis or some form of semantic
relation analysis is a form of parsing.  No syntax trees does not
necessarily mean no parsing.

>
>they often do some fairly sophisticated statistical weighing of
>evidence, however.

Quite true.  The main reason I see for this is the complexity issue.
You can very quickly and efficiently store the information needed to
use statistical techniques, while deep understanding techniques (ala
Dyer, Schank) are generally intractable for large corpuses.  The better
systems find ways of sneaking in some portion of the deep understanding
without compromising the speed and size limitations too much.  (for
example, consider the Wendy Lehnert's CIRCUS system)

>
>template matching, statistical models and other non-grammatical
>techniques are pretty darned important, i would have thought.

These techniques can be quite effective, but the fact is that they
are grammatical in nature.  Consider where the statistics of transitional
probabilities between grammatic categories came from, or where the
templates being matched came from, and you'll find them rooted in
linguistics.

I think Professor Covington was correctly pointing out that without
some surface level parsing into some form of representation, the text
is all just a lot of letters, so in the long run you need these things
to do any form of NLP.

					John

-- 

<a href="ftp://ftp.netcom.com/pub/jrp/home.html">My Home Page</a>
<a href="ftp://ftp.netcom.com/pub/jrp/bizarro.html">Page-O Bizarro</a>
<a href="ftp://ftp.netcom.com/pub/jrp/libmat.html">Liberal Materialists Page</a>
