Newsgroups: comp.ai.nat-lang
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!news.mathworks.com!uhog.mit.edu!bloom-beacon.mit.edu!spool.mu.edu!howland.reston.ans.net!pipex!sunic!trane.uninett.no!nac.no!eunet.no!nuug!EU.net!sun4nl!freya.let.rug.nl!vannoord
From: vannoord@let.rug.nl (Gertjan van Noord)
Subject: Re: Complexity of Parsing?
Sender: news@let.rug.nl (News system at let.rug.nl)
Message-ID: <1994Nov13.131539.23090@let.rug.nl>
Date: Sun, 13 Nov 1994 13:15:39 GMT
References: <1994Nov12.160114.1159@seas.smu.edu>
Nntp-Posting-Host: tyr.let.rug.nl
Organization: Faculteit der Letteren, Rijksuniversiteit Groningen, NL
Lines: 75

In article <1994Nov12.160114.1159@seas.smu.edu> pedersen@seas.smu.edu (Ted Pedersen) writes:
>I've been doing some thinking about parsing sentences with unknown
>words. My first thought is that this is just an extreme case of
>parsing a sentence with an ambiguous word. That is, an unknown word
>is ambiguous in that it could be any of the known syntactic
>categories. 
>
>So it seems one course of action (not the best perhaps) would be to
>parse the sentence assuming that each of the possible categories is
>valid. If you have four categories of words (say nouns, verbs,
>determiners, and adjectives) you could parse the sentence four times,
>using each of these categories for the unknown word once. 

huh?
Of course not, you just have four categories for that one word, and
you parse the sentence once (just like for ordinary lexical ambiguities)

>
>Given all the above, my question relates to determining the
>complexity of such a parsing scheme. I believe that a chart parser can
>find the first valid structure of a sentence with unknown/ambiguous
>words in time n^3 (where n is the number of words in the sentence) but
>that to find every possible structure for that sentence becomes
>exponential. Is this true? In the exponential case does parsing reduce
>to SAT or 3SAT and belong to some class of NP problems (hard,
>complete, etc)? I'd be curious to see such a reduction if this is the
>case. 

what are we talking about here? If we are talking about context-free
grammars then recognition is cubic, but so is parsing. Of course, if
you insist on the parser to enumerate each parse tree then this becomes
exponential simply because there might be an exponential number of 
parse trees.

if we are talking about, say, Definite clause grammars then we are lost:
recognition is undecidable.

Anyway, treating unknown words the way you describe does not alter the
complexity one bit, given that lexical ambiguities do exist. 

>
>It also seems to me that the number of potential categories for an
>unknown/ambiguous word would impact the complexity. 
in practice it probably does, but not in worst-case analysis.

> For instance it
>seems like it would be more complex to parse a sentence that has a
>word that could be one of four categories than it would be to parse a
>sentence with with a word which could be one of two categories. 
>I don't recall seeing the impact of the number of syntactic categories
>(and grammar rules for that matter) on the complexity of parsing. Is
>there any? 
No there isn't any. This can be easily seen if you take into account that
there might be unary rules. So even if each word were non-ambiguous such
unary rules would effectively render them ambiguous.

>
>In addition, if we are using a chart parser with very nice time
>complexity (n^3) it seems like the space complexity would go to heck
>very quickly if we are dealing with unknown/ambiguous words. Any
>thoughts/literature on space complexity in parsing? 
same as above. You get that in worst case anyway.

>
>I've asked lots of questions I know. I'm pretty confused at the moment
>so any comments/pointers would be appreciated. 
>
>Regards,
>Ted
>---
>* Ted Pedersen                                  pedersen@seas.smu.edu * 
>* Department of Computer Science and Engineering,                     *
>* Southern Methodist University, Dallas, TX 75275      (214) 768-2126 *


