Newsgroups: comp.ai.nat-lang
Path: cantaloupe.srv.cs.cmu.edu!das-news.harvard.edu!news2.near.net!MathWorks.Com!europa.eng.gtefsd.com!howland.reston.ans.net!gatech!ncar!uchinews!iitmax!sanders
From: sanders@iitmax.iit.edu (Greg Sanders)
Subject: Statistical vs. NLU approaches to IR
Message-ID: <1994Sep27.214521.30117@iitmax.iit.edu>
Organization: Illinois Institute of Technology / Academic Computing Center
References: <1994Sep25.174751.28787@cs.cornell.edu> <1994Sep25.230427.29778@iitmax.iit.edu> <1994Sep27.020633.16498@cs.cornell.edu>
Date: Tue, 27 Sep 94 21:45:21 GMT
Lines: 65

In article <1994Sep27.020633.16498@cs.cornell.edu> chrisb@cs.cornell.edu (Chris Buckley) writes:
>sanders@iitmax.iit.edu (Greg Sanders) writes:
>>The most useful for IR, but not for understanding the document (which is
>>what the phrase "representation of the meaning" denotes).
>
>It most certainly is "a representation of the meaning".  I entirely
>agree it is not a complete representation of the meaning; but I don't
>know anybody in NLU who considers a complete representation of the meaning
>to be feasible (since that requires a complete representation of the
>shared world knowledge of the writer/reader).

Well, my own reaction is that statistical analysis of the document simply
isn't understanding in any real sense.  We appear to disagree here, and
I think I do understand what you are saying.

>>  Let's recast the problem.  Suppose we don't just
>>want to retrieve the document.  Suppose the system must be able to 
>>answer questions about it and to generate a paragraph saying *why* 
>>you will think the document is of interest.    [ . . . ]
> [ . . . ]
>   1. The system can certainly correctly answer a lot of questions about
>the document.  Eg. about your response here : "Did the message
>discuss NLU?".  "Was your message about apples?"

I think as a minimum it should know that it was the apples that were eaten :-) 
not to mention why.

Your list of questions is obviously valid--they are important questions,
and there are indeed a lot of questions you could answer using statistical
techniques.  There are far more that you cannot and IMHO never will answer.

>   2. A possible paragraph: "Your article was about NLU and IR because
>'NLU' was the most highly weighted term in the article and 'IR' was
>the fifth highest weighted term".

This is *not* a statement of why the document is of interest.
It is simply a statement of why it was retrieved.  What makes your
paragraph reasonable is that the system knows nothing about me except
what I asked.  What I really have in mind is that the system should
be able to say, "The article supports the use of NLU techniques for
IR because they are more extensible.  It offers examples of tasks that
NLU techniques should perform better than statistical approaches do."
 
>You can do a lot with purely statistical approaches.  [ . . . ]
>(See our "Automatic Analysis, Theme Generation, and Summarization of 
> Machine-Readable Texts" by Salton,Allan,Buckley,Singhal, in Science, 
> June 3rd,1994.)

Yes.  It is an interesting article.  I was impressed.
 
>Can you give me an example of a system operating on general text that
>you consider more of an NLU success?

I think the message-understanding conferences provide various
examples.  Intelligent tutoring systems (my field) provide another.
If you want to carry on a dialogue, you are basically constrained
to talk about areas where you have understood the other participant.
Statistical approaches have some impressive successes, and the reasons
why deserve careful study, but they you can also see what sorts of 
tasks they will *never* perform.  We probably want a hybrid approach.
The question of whether or not statistical analysis of text constitutes
"understanding" seems to be an interesting philosophical divide.

-- Greg

