Newsgroups: comp.ai.nat-lang
Path: cantaloupe.srv.cs.cmu.edu!rochester!cornell!chrisb
From: chrisb@cs.cornell.edu (Chris Buckley)
Subject: Re: Books on Intro. natural language proces
Message-ID: <1994Sep25.174751.28787@cs.cornell.edu>
Organization: Cornell Univ. CS Dept, Ithaca NY 14853
References: <35k3j5$43h@redwood.cs.scarolina.edu> <TED.94Sep23003320@kyklopon.crl.nmsu.edu> <ASHWIN.94Sep23102450@pravda.cc.gatech.edu> <TED.94Sep23121916@kyklopon.crl.nmsu.edu> <3608t2$eo9@news.cais.com>
Date: Sun, 25 Sep 1994 17:47:51 GMT
Lines: 59

crawford@cais.cais.com (Randolph Crawford) writes:

>In article <TED.94Sep23121916@kyklopon.crl.nmsu.edu>, Ted Dunning <ted@crl.nmsu.edu> wrote:
>>In article <ASHWIN.94Sep23102450@pravda.cc.gatech.edu>, ashwin@cc.gatech.edu (Ashwin Ram) writes:
>>
>>   This does not mean that fields such as information retrieval are
>>   unimportant or that methods such as statistical matching are
>>   useless.  All this means is that those particular methods, while
>>   resulting in useful technology, tell us very little about
>>   *intelligence* in general or *natural language understanding* in
>>   particular.
>>
>>that the best method currently known (after 35 years of work) to
>>automatically classify documents based on their meaning is *not* based
>>on syntax, knowledge representation or symbol pushing is a very
>>significant experimental result.
>
>I think this whole debate revolves around the mistaken equating of
>NLU with NLP.  Document classification is *not* NLU.

I agree with you 100%.  But whose mistake is it?  Ted has made no
claims that statistical IR is a good approach for solving the NLU
problem.  It obviously isn't.  He <has> claimed that it's the most
important NLP success.  I would agree.

It's Ted's responders from the AI community that seem to be claiming
that it doesn't deserve attention in an NLP book because it can't
solve the NLU problem.  This is quite perplexing to me.  One would
expect a field to build off of its successes, or at least try and 
understand them.

I would think almost any practical advance in NLU would improve IR
performance.  IR is mostly an attempt to match the meaning of a query
against the meaning of a document.  Any advances how meanings can be
represented should translate into improved retrieval.  As Ted said,
after 35 years we have gotten no improvement in IR due to traditional
NLP/NLU, and that raises fundamental questions that I would have
thought would have been important to anybody in NLP or looking at NLP.

As of right now, I would claim the most useful representation of the meaning
of a NL text is an un-ordered, weighted set of words.  It's very 
difficult to improve on it.  The SMART IR system has been around for
over 30 years, and during most of that time people have been trying to
augment this list of weighted terms with syntactic and semantic
information.  None of the augmentations have helped significantly; and
any place they have helped (eg using POS-tagged noun phrases), a
pure statistical approach (eg consider any two adjacent non-stopwords
to be a "phrase") works better.

I don't expect this situation to last.  In 5 years I expect parsing
and NLU techniques will improve IR performance noticably.  But I said
that 5 years ago also ... and people were saying that in the '60s.
Perhaps it's time to consider a paradigm shift?

                                        ChrisB
-- 
Chris Buckley                   Dept of Computer Science
chrisb@cs.cornell.edu           Upson Hall, Cornell University
(609)  275-4691                 Ithaca, NY   14852
