Newsgroups: comp.ai.nat-lang
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!howland.reston.ans.net!cs.utexas.edu!convex!seas.smu.edu!pedersen
From: pedersen@seas.smu.edu (Ted Pedersen)
Subject: Complexity of Parsing?
Message-ID: <1994Nov12.160114.1159@seas.smu.edu>
Sender: news@seas.smu.edu (USENET News System)
Nntp-Posting-Host: rapid_f.seas.smu.edu
Organization: SMU - School of Engineering & Applied Science - Dallas
Date: Sat, 12 Nov 1994 16:01:14 GMT
Lines: 45

I've been doing some thinking about parsing sentences with unknown
words. My first thought is that this is just an extreme case of
parsing a sentence with an ambiguous word. That is, an unknown word
is ambiguous in that it could be any of the known syntactic
categories. 

So it seems one course of action (not the best perhaps) would be to
parse the sentence assuming that each of the possible categories is
valid. If you have four categories of words (say nouns, verbs,
determiners, and adjectives) you could parse the sentence four times,
using each of these categories for the unknown word once. 

Given all the above, my question relates to determining the
complexity of such a parsing scheme. I believe that a chart parser can
find the first valid structure of a sentence with unknown/ambiguous
words in time n^3 (where n is the number of words in the sentence) but
that to find every possible structure for that sentence becomes
exponential. Is this true? In the exponential case does parsing reduce
to SAT or 3SAT and belong to some class of NP problems (hard,
complete, etc)? I'd be curious to see such a reduction if this is the
case. 

It also seems to me that the number of potential categories for an
unknown/ambiguous word would impact the complexity. For instance it
seems like it would be more complex to parse a sentence that has a
word that could be one of four categories than it would be to parse a
sentence with with a word which could be one of two categories. 
I don't recall seeing the impact of the number of syntactic categories
(and grammar rules for that matter) on the complexity of parsing. Is
there any? 

In addition, if we are using a chart parser with very nice time
complexity (n^3) it seems like the space complexity would go to heck
very quickly if we are dealing with unknown/ambiguous words. Any
thoughts/literature on space complexity in parsing? 

I've asked lots of questions I know. I'm pretty confused at the moment
so any comments/pointers would be appreciated. 

Regards,
Ted
---
* Ted Pedersen                                  pedersen@seas.smu.edu * 
* Department of Computer Science and Engineering,                     *
* Southern Methodist University, Dallas, TX 75275      (214) 768-2126 *
