Newsgroups: comp.lang.prolog
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!news.mathworks.com!gatech!howland.reston.ans.net!agate!news.ucdavis.edu!csus.edu!netcom.com!ludemann
From: ludemann@netcom.com (Peter Ludemann)
Subject: Re: Handling syntax errors with DCGs
Message-ID: <ludemannD8LoBD.9E7@netcom.com>
Organization: NETCOM On-line Communication Services (408 261-4700 guest)
References: <3oc1mh$ui@ogre.cs.waikato.ac.nz> <PEREIRA.95May5103816@alta.research.att.com>
Date: Mon, 15 May 1995 03:28:25 GMT
Lines: 39
Sender: ludemann@netcom19.netcom.com

In article <PEREIRA.95May5103816@alta.research.att.com>,
Fernando Pereira <pereira@research.att.com> wrote:
>In article <3oc1mh$ui@ogre.cs.waikato.ac.nz> John Grundy <jgrundy@cs.waikato.ac.nz> writes:
>   I'm using DCGs in LPA MacProlog32 to parse a language held in a text
>   window. I grab the tokens using etoks/2 and then give this as a list
>   to phrase/3 to process using the DCG rules I've defined.
>
>   I'd like to be able to detect syntax errors and move the window
>   cursor to the (approximate!) position of the error, beep, etc.
>The DCG parser obtained through the canonical translation of DCGs to
>Prolog is a top-down backtrack parser. Such parsers are notoriously
>bad at error reporting. By transforming your DCG into another one ...

A quick and dirty solution that worked for me is:

- Have your lexical analyzer return tokens with positioning information
  (e.g., instead of returning 'foo', return token('foo', 453,455), where
  453-455 are the start and end coordinates of the token.
- Add to your token matcher something which asserts how far you've gone
  e.g, "r --> [x], s." is changed to something like:
     "r --> token(x), s."
   where: token(X) --> [token(X,Start,End)],
                       {assertz(reached_here(End))}.

You probably want to hide this with a bit of term_expansion.

You might want to pass extra parameters in your rules so that you can
give start-end coordinates for parsed phrases:
    r(Start,End) --> token(x,Start,_), s(_,End).

Performance isn't spectacular (although not as bad as I had feared it
would be): you can easily find the farthest point your parse reached
and produce reasonable error messages.

I'd post code to do this, but unfortunately it's proprietary.


-- 
Peter Ludemann                      ludemann@netcom.com
