>From ok@quintus Wed Jul 27 03:16:56 1988
Relay-Version: version Notes 2.8  87/9/11; site otter.hple.hp.com
From: ok@quintus
Date: Wed, 27 Jul 1988 02:16:56 GMT
Date-Received: Wed, 27 Jul 1988 18:23:05 GMT
Subject: Grammar rule translator.
Message-ID: <196@quintus.UUCP>
Organization: Quintus Computer Systems, Inc.
Path: otter!hplabs!sri-unix!quintus!ok
Newsgroups: comp.lang.prolog
Sender: news@quintus.UUCP
Reply-To: ok@quintus ()
Lines: 273

In his recent posting about the Oxford Prolog Library, Jocelyn Paine
said that one of the components was a grammar rule translator.  The
description says 'The translator is essentially the same as that
published in "Programming in Prolog", by Clocksin & Mellish'.

The grammar rule translator which appeared in the 1st and 2nd editions
of Clocksin & Mellish had the following flaws:

-- there was a variable name spelling mistake in one of the clauses,
   which broke the translation of rules with pushback
   (fixed in the 3rd edition, I think);

-- the translator did not expand (If->Then;Else)s
   (still not handled in the 3rd edition);

-- the translator did not use the 'C'/3 expansion, but tried to do
   the terminal matching in the translator, which meant that code with
   cuts and ``meta-logical'' operations did not work as expected
   (still a problem in the 3rd edition);

-- the translator does nothing useful with variables appearing as
   non-terminals (in the 3rd edition it goes into a loop);

-- the definition of phrase/2 is wrong--it only accepts nonterminals
   but should accept grammar rule bodies (i.e. phrase((p,q), S) will
   call ','(p,q,S,[]) when it should call (p(S,S1),q(S1,[])))
   (still the case in the 3rd edition).

This is reasonable because Clocksin & Mellish do not purport to provide
a specification of the Prolog language, only instruction in how to use
it.  Their grammar rule translator is presented only as the answer to
an exercise.

The lack of a reliable specification means that some vendors have left
grammar rules out entirely, and others have botched the job.  Others
have added "features".  For example, ALS decided to make calls to certain
built-in predicates are quietly translated as themselves.  The trouble
with this is that this is done _quietly_ (so you never know when it has
happened without looking at the code in the data base) and that there is
no apparent pattern to the choice of which builtins are left alone and
which are changed (so that you, or at any rate I, can never recall which
are which). In the absence of an agreed standard, however, nobody can say
that this was wrong.

There is a group of people who purport to be developing a Prolog
standard at the taxpayer's expense.  You would expect that group to
have addressed the grammar rule translation issue, ideally by
distributing a draft specification in the form of usable Prolog code.
The BSI Prolog substandard committee, however, in their great wisdom,
have preferred to do other things.

There's only one thing to do, then, and that's to publish source code.
The code which follows in this message was rewritten from scratch for
this purpose.  The purpose of this code is to serve as a workable
specification of
    phrase/3
    phrase/2
    'C'/3
and the translation from Lhs-->Rhs to Head:-Body form.  Error checking
and reporting has quite deliberately been left out, and a couple of cases
have been over-generalised to accomodate this.  I have supplied the
dcg rule translation as a separate predicate 'dcg rule'/2 to decouple
the question of what DCG rules are translated to from the question of
how DCG rule translation is integrated into the rest of a Prolog system;
I am not proposing that 'dcg rule'/2 should be directly available under
that name.   phrase/[2,3] and 'C'/3 _should_ be directly available.

There may be mistakes in the code that follows.  There may be things it
does that it shouldn't do.  There may be things it doesn't do that you
think it should.  Let's hear about them.  There are only 66 lines of
Prolog code; if we can't get something this size satisfactory, we'll
_deserve_ the BSI substandard.

%   File   : DCG.PL
%   Author : Richard A. O'Keefe
%   Updated: Tuesday July 26th, 1988.
%   Purpose: Definite Clause Grammar rule to Prolog clause translator.

/*  This file is written in the ISO 8859/1 character set.  The "Copyright"
    line contains after the right parenthesis a character which when
    transmitted was character 169, the international copyright symbol.

    Copyright (C)) 1988, Quintus Computer Systems, Inc.

    This file is distributed in the hope that it will be useful,
    but without any warrantee.  You are expressly warned that it is meant
    to serve as a model, not for direct use:  all error checking and
    reporting has been omitted, and mistakes are almost surely present.
    Permission is granted to anyone to distribute verbatim copies of
    this source code as received, in any medium, provided that the
    copyright notice, the nonwarrantee warning, and this permission
    notice are preserved.  Permission is granted to distribute modified
    versions of this source code, or of portions of it, under the above
    conditions, plus the conditions that all changed files carry
    prominent notices stating who last changed them and that the derived
    material is subject to this same permission notice.  Permission is
    granted to include this material in products for which money is
    charged, provided that the customer is given written notice that the
    code is (or is derived from) material provided by Quintus Computer
    Systems, Inc., and that the customer is given this source code on
    request.

    ----------------------------------------------------------------

    Now that we've got that (adapted from the GNU copyright notice)
    out of the way, here are the technical comments.

    The predicates are all named 'dcg <something>'/<some arity> in order
    to keep out of the way, with the exception of phrase/2 and phrase/3
    which bear their proper names.  Only phrase/[2,3] and 'dcg rule'/2
    are meant to be called directly, and 'dcg rule'/2 is meant to be called
    from expand_term/2.  You need to keep 'dcg body'/4 and its dependents
    around at run time so that variables as nonterminals in DCG rule bodies
    will work correctly.

    So that Quintus have _something_ left to sell, this code has been
    rewritten from scratch with no error checking or reporting code at
    all, and a couple of places accept general grammar rule bodies where
    they are really supposed to demand lists of terminals.  However, any
    rule which is valid according to the Quintus Prolog manual will be
    translated correctly, except that this code makes no attempt to handle
    module: prefixes.  (The change is trivial.) 

    Note that 'dcg rule'/2 and phrase/[2,3] are steadfast.
*/

%   dcg rule(+Grammar Rule, -Equivalent Clause)

'dcg rule'(-->(Head0,Body0), Clause) :-
    'dcg head'(Head0, Head, PushBack, S0, S),
    'dcg body'(Body0, Body1, S0, S),
    'dcg conj'(Body1, PushBack, Body),
    Clause = :-(Head,Body).


%   dcg head(+Head0, -Head, -PushBack, -S0, -S)
%   recognises both
%   NonTerminal, [PushBackList] -->
%   and
%   NonTerminal -->
%   It returns the difference pair S0\S which the body is to parse.
%   To avoid error checking, it will accept an arbitrary body in place
%   of a pushback list, but it should demand a proper list.

'dcg head'((Head0,PushBack0), Head, PushBack, S0, S1) :- !,
    'dcg goal'(Head0, Head, S0, S),
    'dcg body'(PushBack0, PushBack, S, S1).
'dcg head'(Head0, Head, true, S0, S) :-
    'dcg goal'(Head0, Head, S0, S).


%   dcg goal(+Goal0, -Goal, +S0, +S)
%   adds the arguments S0, S at the end of Goal0, giving Goal.
%   It should check that Goal0 is a callable term.

'dcg goal'(Goal0, Goal, S0, S) :-
    functor(Goal0, F, N),
    N1 is N+1,
    N2 is N+2,
    functor(Goal, F, N2),
    arg(N2, Goal, S),
    arg(N1, Goal, S0),
    'dcg args'(N, Goal0, Goal).


%   dcg args(+N, +Goal0, +Goal)
%   copies the first N arguments of Goal0 to Goal.

'dcg args'(N, Goal0, Goal) :-
    (   N =:= 0 -> true
    ;   arg(N, Goal0, Arg),
        arg(N, Goal,  Arg),
        M is N-1,
        'dcg args'(M, Goal0, Goal)
    ).


%   dcg body(+Body0, -Body, +S0, +S)
%   translates Body0 to Body, adding arguments as needed to parse S0\S.
%   It should complain about bodies (such as 2) which are not callable
%   terms, and about lists of terminals which are not proper lists.
%   To avoid error checking, [a|foo] is accepted as [a],foo, but it
%   really should complain.  ONLY the forms lists here should be treated;
%   other non-terminals which look like calls to built-ins could well be
%   commented on (no error reporting here) but should be expanded even
%   so.  Thus X=Y as a nonterminal is to be rewritten as =(X,Y,S0,S),
%   perhaps with a warning.  If you want the translation X=Y, use {X=Y}.

'dcg body'(Var, Body, S0, S) :- var(Var), !,
    Body = phrase(Var,S0,S).
'dcg body'((A0,B0), Body, S0, S) :- !,
    'dcg body'(A0, A, S0, S1),
    'dcg body'(B0, B, S1, S),
    'dcg conj'(A, B, Body).
'dcg body'(->(A0,B0), ->(A,B), S0, S) :- !,
    'dcg body'(A0, A, S0, S1),
    'dcg body'(B0, B, S1, S).
'dcg body'(;(A0,B0), ;(A,B), S0, S) :- !,
    'dcg disj'(A0, A, S0, S),
    'dcg disj'(B0, B, S0, S).
'dcg body'({}(A), A, S, S) :- !.
'dcg body'(!, !, S, S) :- !.
'dcg body'([], true, S, S) :- !.
'dcg body'([H0|T0], Body, S0, S) :- !,
    'dcg term'(H0, H, S0, S1),
    'dcg body'(T0, T, S1, S),
    'dcg conj'(H, T, Body).
'dcg body'(NT0, NT, S0, S) :-
    'dcg goal'(NT0, NT, S0, S).


%   dcg term(+T0, -T, +S0, +S)
%   generates code (T) which succeeds when there is a terminal T0
%   between S0 and S.  This version uses the DEC-10 Prolog predicate
%   'C'/3 for compatibility with DEC-10 Prolog, C Prolog, Quintus Prolog.
%   This is the only place that knows how terminals are translated, so
%   you could supply instead the definition
%   'dcg term'(T0, S0=[T0|S], S0, S).
%   and reap the same benefits.  The one thing you must not do is
%   NO! 'dcg term'(T0, true, [T0|S], S). DON'T DO THAT!

'dcg term'(T0, 'C'(S0,T0,S), S0, S).


%  To see why dcg disj/4 is needed, consider the translation of
%  ( [] | [a] ).  We have to insert S1=S0 somewhere, but we do it at
%  "compile"-time if we can.

'dcg disj'(Body0, Body, S0, S) :-
    'dcg body'(Body0, Body1, S1, S),
    (   S1 == S -> 'dcg conj'(S1=S0, Body1, Body)
    ;   S1 = S0, Body = Body1
    ).


%   dcg conj(+A, +B, -C)
%   combines two conjunctions A, B, giving C.  Basically, we want to
%   ensure that there aren't any excess 'true's kicking around (in a
%   compiled system, that shouldn't matter).  There is room for some
%   leeway here: I have chosen to flatten A completely.

'dcg conj'(A, true, A) :- !.
'dcg conj'(A, B, C) :-
    'dcg CONJ'(A, B, C).

'dcg CONJ'(true, C, C) :- !.
'dcg CONJ'((A,As), C0, (A,C)) :- !,
    'dcg CONJ'(As, C0, C).
'dcg CONJ'(A, C, (A,C)).


%   'C'(S0, T, S)
%   is true when the terminal T "Connects" the "string positions" S0 and S.

'C'([T|S], T, S).


%   phrase(+NT0, ?S0)
%   is true when the list S0 is in the language defined by the
%   grammar rule body NT0.  E.g. phrase(([a],[b]), [a,b]).

phrase(NT0, S0) :-
    phrase(NT0, S0, []).


%   phrase(+NT0, ?S0, ?S)
%   is true when the list S0\S is in the language defined by the
%   grammar rule body NT0.  E.g. phrase(([a],[b]), [a,b|X], X).

phrase(NT0, S0, S) :-
    'dcg body'(NT0, NT, T0, T),
    T0 = S0, T = S,
    call(NT).
