Newsgroups: comp.ai.nat-lang
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!howland.reston.ans.net!pipex!uunet!bcstec!bronte!snake!rwojcik
From: rwojcik@atc.boeing.com (Richard Wojcik)
Subject: Re: best parser???
Message-ID: <1994Nov30.221736.10961@grace.rt.cs.boeing.com>
Sender: usenet@grace.rt.cs.boeing.com (For news)
Reply-To: rwojcik@atc.boeing.com
Organization: Research & Technology
References: <MAGERMAN.94Nov22110438@snoopy.bbn.com>
Date: Wed, 30 Nov 1994 22:17:36 GMT
Lines: 90

In article 94Nov22110438@snoopy.bbn.com, magerman@bbn.com (David Magerman) writes:
>I don't think we disagree too much here, but we are probably coming at
>this problem from slightly different directions.  Your goal (if I read
>you correctly) is to solve a natural language processing problem
>(information extraction), whereas mine is to solve a basic research
>problem (natural language parsing).  There is, of course, the naive
>hope in the back of my mind that solving my problem will impact on
>your ability to solve your problem.  But, as you state in your reply,
>full parsing hasn't proven itself too useful to date.  I'll keep
>trying, though :-).

I think that you misinterpreted me a little bit.  First of all, I consider "information
extraction" to be any kind of information extraction, not just what we 
conventionally mean by "natural language understanding."  One of our goals
is to judge compliance to a writing standard.  Another is to fill in message 
templates.   You can't really use the same parser/grammar to do both, although
you can get by with a certain amount of overlap.  The point is that you can't
really evaluate a parser system until you know what it is that the system is
going to do with the parses.  I know that what I am saying is controversial, and
I suspect that we disagree on this point.  But I am willing to defend it.  (BTW, I
want to make it clear that I am not just talking about parsers.  Evaluating parsers
is not quite the same as evaluating grammars that the parsers apply to token
strings.)

> [deleted material about other types of evaluations]
>The third measure, the one I use in my work (for better or worse), is
>the exact-match criterion: a parse tree is correct if and only if it
>matches the "correct" parse exactly, both in structure and in labels.
>This criterion is perhaps overly stringent, since there can be more
>than 1 "correct" parse for a sentence.  But it certainly could be
>argued that a high score on this measure suggests a good parser.  It
>is a bit harder to argue what a low score on this measure means, but I
>would contend that if the task is to parse sentences according a
>prespecified scheme, then this is a reasonable comparison metric for
>parsers.

I want to be careful not to confuse parsers with grammars, although there
is a sense in which parsers are hard-wired for types of grammars and conventions.
We are really talking about the trees that are the output of parsing events.
The problem I have with the "exact match" criterion is that it finesses the
information-use problem.  The same tokens might receive a different analysis
in a different context.  By insisting on one analysis, you are insisting on one
context.   Therefore, your evaluation criterion can never really get away
from an implicit concept of information use.  Your parser/grammar  can
only be judged optimal for that one context (and/or domain)--whether it 
be WSJ reports of terrorist events or aircraft maintenance documentation or
some melange of 25 different text sources.

I am not totally against what you are doing.  I just don't think that it can be
done in a completely information-neutral way.   Ideally, we would have a 
parser/grammar that would operate efficiently in all domains and contexts.
That would require us to have a system that could generate all possible
parse trees for all contexts (I use "context" in a very broad way here) and 
strategies for assigning the proper parse tree in the given context.  I don't know
of any NLP system in existence that really tries to do this.  Most systems that
have broad coverage can only be applied to a single context.  If you want to
apply the system to another context, then you have to apply a different set
of "disambiguation" strategies and, perhaps, a considerable amount of 
other optimization strategies.   It might be an expensive or a cheap thing to
do, but optimization strategies are not part of your proposed evaluation 
technique.

>...
>And I definitely agree that getting accurate parse trees is not a goal
>of language processing.  I'm not even convinced that it's a necessary
>part of the process.  However, I do believe this: if you have a good
>parser, i.e. one that gets better than, say, 80% of the sentences
>correct, then you'll have a better chance of solving NLP problems than
>if you didn't.  It's possible that you can get away without a good
>parser, but I think right now that's mostly true because of the low
>standards of success in the field.

I can see what you are getting at, but there may well be too many variables
to make the results easily interpretable.  There are serious questions in my
mind as to what constitutes an adequate (or ideal) corpus for this kind of
test.  How many authors created the text?  How many domains were represented?
How long were the sentences?  How errorful was the corpus?  As soon as you answer
questions like this, your results pertain to the type of applications that might
deal with that type of corpus.  They do not necessarily pertain to all the uses
to which you might put a grammar/parser.  In short, parse trees always imply
an information context.  You can't evaluate a set of parse tree outputs in
an information-neutral way.  You end up with results that look informative but
don't really inform of much.

---

Disclaimer:  Opinions expressed above are not those of my employer.

    Rick Wojcik   (rick.wojcik@boeing.com)   Seattle, WA

