Newsgroups: comp.lang.prolog
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!oitnews.harvard.edu!purdue!lerc.nasa.gov!magnus.acs.ohio-state.edu!math.ohio-state.edu!howland.reston.ans.net!gatech!news.mathworks.com!uunet!in2.uu.net!allegra!alice!pereira
From: pereira@alta.research.att.com (Fernando Pereira)
Subject: Re: Strings in DCG-style Chart Parsing
In-Reply-To: alech@ai.uga.edu's message of 29 Sep 1995 01:07:23 GMT
X-Nntp-Posting-Host: alta.research.att.com
Message-ID: <PEREIRA.95Sep30104755@alta.research.att.com>
Sender: usenet@research.att.com (netnews <9149-80593> 0112740)
Reply-To: pereira@research.att.com
Organization: AT&T Bell Laboratories
References: <441e71$dp7@hobbes.cc.uga.edu>
	<PEREIRA.95Sep24123722@alta.research.att.com>
	<44fgsb$ca3@hobbes.cc.uga.edu>
Date: Sat, 30 Sep 1995 14:47:55 GMT
Lines: 56

In article <44fgsb$ca3@hobbes.cc.uga.edu> alech@ai.uga.edu (Andrew Lech [MSAI]) writes:
   Fernando Pereira (pereira@alta.research.att.com) wrote:
     Serious DCG chart parsers use integers rather than lists to represent
     input positions.   In addition to the redundancy (in space and time) you
     mention, the list representation representation does not allow easy
     indexing of chart items by input position, which, for instance, is
     needed to achieve O(n^3) parsing time for the DCG encoding of a
     CFG...
   Originally, I intended to circumvent any discussion of integer
   representations as I find them to be as unpleasant as using number for
   citations (e.g. [1] for (Chomsky 1957)).  Instead, I think of them as a
   final optimization of an otherwise fine parsing system.

A bit of history. Colmerauer's original logic grammar formalism,
metamorphosis grammars, depended on the list representation of string
positions, because it allowed symbols to be pushed back into the input
stream, with rules such as

	a, [b] --> c, d.

We coined the name "definite clause grammars" for the subset of
metamorphosis grammars lacking that possibility, which has the
property of not depending on the list representation. The nice thing
is that the notion of "position" is completely abstract in a DCG,
being encapsulated in the predicate connects/3. This allows us to use
DCGs to parse things other than strings, for instance word lattices
from speech recognition.

Notice also that efficient parsing algorithms for, say CFGs, must use
some form of position indexing to represent the input, to guarantee
constant time access to the parsing states starting at a certain
position or finishing at a certain position. Since the Prolog encoding
of DCGs straddles the grammar/implementation border, it must deal with
this issue for efficient parsing. It's not any more distasteful than
the use of integers to represent graph nodes in efficient graph
algorithms. 

   However, I appreciate your insight into the influence of lists and (lack
   of) subterm sharing on polynomial parsability.  As Mark Johnson noted in an
   email message, I might as well use what Prolog is better designed to
   handle, so I guess I'm stuck with integers for all nontrivial applications.

But even a Prolog with efficient subterm sharing would not solve the
need for integer representations in parsing word lattices. Why
restrict oneself to strings? 




-- 
Fernando Pereira
2B-441, AT&T Bell Laboratories
600 Mountain Ave, Murray Hill, NJ 07974-0636
pereira@research.att.com


