/*  READ_TERM.PL  */


:- module read_term.


:- public vread/2,
          vread/1,
          tokens_to_term/3,
          tokens_to_term/2,
          default_syntax_error/3,
          define_syntax_error/1.


/*
SPECIFICATION
-------------

This module contains a portable Prolog parser. It implements two things
your own system may lack: a means of reading the variable names of a
term, and a parser that acts on already-read tokens.


PUBLIC vread( Term?, Variables? ):

Reads a term from the current input stream and unifies it with Term.
Variables is bound to a list of [var(Variable,Atom)] pairs, where Atom
is the name of the variable. Anonymous variables do not appear in this
list.

If the term contains a syntax error, vread fails and tries to report a
syntax error. For further details, see under syntax errors below.


PUBLIC vread( Answer? ):

As vread/2, but without the variable names. This is rather pointless,
since your ordinary read will do this faster; I've just added it for
completeness.


PUBLIC tokens_to_term( Tokens+, Term?, Variables? ):

Tokens is a list of tokens, in the form defined by TOKENISE.PL. Term
will be unified with the term they represent, and Variables with a list
of [var(Variable,Atom)] pairs. Fails if there is a syntax error.


PUBLIC tokens_to_term( Tokens+, Term? ):

As tokens_to_term/3, but without the variable names.


Syntax errors:
--------------

When a parsing predicate fails, it calls a private predicate
    syntax_error( Message, Input, BeforeLength )
whose arguments are: a message about the error; the input being parsed;
the number of tokens read before the error was detected.

By default, this predicate calls
    default_syntax_error( Message, Input, BeforeLength )
which reports the error in the form
    ** atom follows expression
     a <<here>>  l b
(for input 'a l b').

You can make it call another predicate P with these arguments by using
define_syntax_error(P). You will have to define P to be of arity three;
usually, you will either write out the error immediately, or save it
for future output, as the Tutor does.

Having called P, syntax_error forces a fail. It does not matter therefore
whether P fails or succeeds.


PUBLIC default_syntax_error( Message+, Input+, BeforeLength+ ):

Message is a list of atoms or numbers, being the syntax error message to
be reported. It will be one of these lists, with appropriate tokens from
the input in the place of the token variables.
    ['Operator expected after expression']
    [Token,'or operator expected']
    ['Expression expected']
    [Token,'cannot start an expression']
    [', or ) expected in arguments']
    [', | or ] expected in list']
    ['Prefix operator',Op,'in context with precedence',Precedence]
    [Token,'follows expression']
(If writing out the lists yourself, leave one space between each element.)

Input is a list of atoms or numbers, being the input being parsed. In
it, element number BeforeLength+1 is '<<here>> ', indicating the
position where the error was detected. Thus the error shown earlier
would result in the list
    [a, ,'<<here>> ', l, b]
(If you write Input with one space between each element, it will show the
original input and the error position.)

Note that all tokens are atoms or numbers, and are not in the format
defined by TOKENISE.PL.

default_syntax_error writes the error in the format shown above, message
first and then input.


PUBLIC define_syntax_error( P+ ):

Causes P( Message, Input, BeforeLength ) to be called when a syntax
error occurs. Note that this is not called until all the tokens have
been read. When the parser detects an error, it works out what it is,
and stores an appropriate message and pointer. It then calls P just
before returning from the parsing predicate.
*/


/*
IMPLEMENTATION
--------------

This module is based on the public-domain DEC-10 library file READ.PL,
written by D.H.D.Warren and Richard O'Keefe, and commented as last
updated on the 5th July 1984. I know Richard has a better version of
that code, so you can import its improvements if you have it. I did ask
him for a copy, but have not received anything.

I have made no changes to the parsing method itself. However, there are
some changes to the interface. The authors originally designed this
parser to work on token lists produced by another DEC-10 utility, the
RDTOK tokeniser. I still use a (very) modified version of that, which
now lives in TOKENISE.PL. My modifications to it have meant the
following changes here:

(1) The token format produced by TOKENISE is not quite the same as
    expected by READ_TERM. It contains extra information needed by other
    parts of the Tutor. "convert_tokens" converts between the two.

(2) The original tokeniser produced a list of variable-name pairs.
    It is now used for other things where that's unnecessary. Also,
    such a list would not now always be meaningful, since the new
    tokeniser can be used to read things which may not be complete
    clauses. For these reasons, READ_TERM now builds the variable-name
    list, also in "convert_tokens".


Portability
-----------

The main problem here is the operator declarations. I assume the
existence of
    current_op( P-, A-, Op+ )
which returns the precedence and fix for an operator. If the same op has
been declared both unary and binary, current_op will be resatisfiable.

If this module can't determine what the operators are, it won't work.
However, that's not too difficult. In general, students using the Tutor
do not define their own operators, and in fact will not even use all the
built-in ones. They will use mainly arithmetic operators, and comma and
implication. So you could get away with making a list of all the
operators you want them to use, and asserting it as a set of clauses for
"current_op".

Another problem is the precedence numbers. Some Prologs work in the range
0-1200, others in the range 0-255. I have assumed the latter, and there are
several constants from this range buried in the code. If you need to
convert to the 0-1200 range, the following table will work:
    1200 - 255
    1100 - 254
    1000 - 252
     999 - 251

Finally, you may have to change the parser itself if the syntax
implemented by your Prolog differs from what is given here. I hope that
won't be necessary: most of the differences will probably be in the
tokeniser.
*/


:- needs
    append / 3,
    length / 2,
    read_tokens / 1.


:- dynamic
    '$syntax_error'/2.


vread( Term, Variables ) :-
    read_tokens( Tokens ),
    convert_tokens( Tokens, Variables, Tokens_ ),
    vread_1( Tokens_, Term ).


vread( Term ) :-
    vread( Term, _ ).


tokens_to_term( Tokens, Term, Variables ) :-
    convert_tokens( Tokens, Variables, Tokens_ ),
    vread_1( Tokens_, Term ).


tokens_to_term( Tokens, Term ) :-
    tokens_to_term( Tokens, Term, _ ).


/*  vread_1( Tokens+, Term? ):
        Here, Tokens has been converted into the form expected by the
        parser. Term will be unified with the corresponding term.
*/
vread_1( end_of_file, end_of_file ) :- !.

vread_1( Tokens, Term ) :-
    vread( Tokens, 255, Term, LeftOver ),
    all_read(LeftOver),
    !,
    Answer = Term.

vread_1( Tokens, Term ) :-
    syntax_error( Tokens ).


/*  all_read( Tokens+ ):
        If there are any unparsed tokens left over in Tokens, gives
        an error.
*/
all_read([]) :- !.
all_read(S) :-
    note_syntax_error(['Operator expected after expression'], S).


/*
The parser.
-----------

This is almost exactly as the original authors wrote it, including the
comments. I have changed "read_" to "vread_", to avoid clashes with
predicates elsewhere. 
*/


/*  expect( Token+, TokensIn+, TokensOut- ):
        Reads the next token, checking that it is the one expected, and
        giving an error message if it is not. It is used to look for
        right brackets of various sorts, as they're all we can be sure
        of.
*/
expect(Token, [Token|Rest], Rest) :- !.
expect(Token, S0, _) :-
    note_syntax_error([Token,'or operator expected']  S0).


/*
The following predicates are interfaces to current_op.
   prefixop(O -> Self, Rarg)
   postfixop(O -> Larg, Self)
   infixop(O -> Larg, Self, Rarg)
*/


/*  prefixop( Op+, OwnPrec-, RightArgPrec- ):
*/
prefixop(Op, Prec, Prec) :-
    current_op(Prec, fy, Op), !.
prefixop(Op, Prec, Less) :-
    current_op(Prec, fx, Op), !,
    Less is Prec-1.


/*  postfixop( Op+, LeftArgPrec-, OwnPrec- ):
*/
postfixop(Op, Prec, Prec) :-
    current_op(Prec, yf, Op), !.
postfixop(Op, Less, Prec) :-
    current_op(Prec, xf, Op), !, Less is Prec-1.


/*  infixop( Op+, LeftArgPrec-, OwnPrec-, RightArgPrec-):
*/
infixop(Op, Less, Prec, Less) :-
    current_op(Prec, xfx, Op), !, Less is Prec-1.
infixop(Op, Less, Prec, Prec) :-
    current_op(Prec, xfy, Op), !, Less is Prec-1.
infixop(Op, Prec, Prec, Less) :-
    current_op(Prec, yfx, Op), !, Less is Prec-1.


/*    ambigop(+, -, -, -, -, -),
*/
ambigop(F, L1, O1, R1, L2, O2) :-
    postfixop(F, L2, O2),
    infixop(F, L1, O1, R1), !.


/*  vread( TokenList+, Precedence+, Term-, LeftOver- ):
        Parses a Token List in a context of given Precedence,
        returning a Term and the unread Left Over tokens.
*/
vread([Token|RestTokens], Precedence, Term, LeftOver) :-
    vread(Token, RestTokens, Precedence, Term, LeftOver).
vread([], _, _, _) :-
    note_syntax_error(['Expression expected'], []).


/*  vread( Token+, RestTokens+, Precedence+, Term-, LeftOver- ):
*/
vread(var(Variable,_), S0, Precedence, Answer, S) :- !,
    exprtl0(S0, Variable, Precedence, Answer, S).

vread(atom(-), [integer(Number)|S1], Precedence, Answer, S) :-
    Negative is -Number, !,
    exprtl0(S1, Negative, Precedence, Answer, S).

vread(atom(-), [real(Number)|S1], Precedence, Answer, S) :-
    Negative is -Number, !,
    exprtl0(S1, Negative, Precedence, Answer, S).

vread(atom(Functor), ['('|S1], Precedence, Answer, S) :- !,
    vread(S1, 251, Arg1, S2),
    vread_args(S2, RestArgs, S3),
    Term =.. [Functor,Arg1|RestArgs], !,
    exprtl0(S3, Term, Precedence, Answer, S).

vread(atom(Functor), S0, Precedence, Answer, S) :-
    prefixop(Functor, Prec, Right), !,
    after_prefix_op(Functor, Prec, Right, S0, Precedence, Answer, S).

vread(atom(Atom), S0, Precedence, Answer, S) :- !,
    exprtl0(S0, Atom, Precedence, Answer, S).

vread(integer(Number), S0, Precedence, Answer, S) :- !,
    exprtl0(S0, Number, Precedence, Answer, S).

vread(real(Number), S0, Precedence, Answer, S) :- !,
    exprtl0(S0, Number, Precedence, Answer, S).

vread('[', [']'|S1], Precedence, Answer, S) :- !,
    exprtl0(S1, [], Precedence, Answer, S).

vread('[', S1, Precedence, Answer, S) :- !,
    vread(S1, 251, Arg1, S2),
    vread_list(S2, RestArgs, S3), !,
    exprtl0(S3, [Arg1|RestArgs], Precedence, Answer, S).

vread('(', S1, Precedence, Answer, S) :- !,
    vread(S1, 1200, Term, S2),
    expect(')', S2, S3), !,
    exprtl0(S3, Term, Precedence, Answer, S).

vread(' (', S1, Precedence, Answer, S) :- !,
    vread(S1, 1200, Term, S2),
    expect(')', S2, S3), !,
    exprtl0(S3, Term, Precedence, Answer, S).

vread('{', ['}'|S1], Precedence, Answer, S) :- !,
    exprtl0(S1, '{}', Precedence, Answer, S).

vread('{', S1, Precedence, Answer, S) :- !,
    vread(S1, 1200, Term, S2),
    expect('}', S2, S3), !,
    exprtl0(S3, '{}'(Term), Precedence, Answer, S).

vread(string(List), S0, Precedence, Answer, S) :- !,
    exprtl0(S0, List, Precedence, Answer, S).

vread(Token, S0, _, _, _) :-
    note_syntax_error([Token,'cannot start an expression']  S0).


/*  vread_args( Tokens+, TermList-, LeftOver- ):
        Parses {',' expr(251)} ')' and returns a list of terms.
*/
vread_args([','|S1], [Term|Rest], S) :- !,
    vread(S1, 251, Term, S2), !,
    vread_args(S2, Rest, S).
vread_args([')'|S], [], S) :- !.
vread_args(S, _, _) :-
    note_syntax_error([', or ) expected in arguments'], S).


/*  vread_list( Tokens+, TermList-, LeftOver- ):
        Parses {',' expr(251)} ['|' expr(251)] ']' and returns a list of terms.
*/
vread_list([','|S1], [Term|Rest], S) :- !,
    vread(S1, 251, Term, S2), !,
    vread_list(S2, Rest, S).
vread_list(['|'|S1], Rest, S) :- !,
    vread(S1, 251, Rest, S2), !,
    expect(']', S2, S).
vread_list([']'|S], [], S) :- !.
vread_list(S, _, _) :-
    note_syntax_error([', | or ] expected in list'], S).


/*  after_prefix_op( Op+, Prec+, ArgPrec+, Rest+, Precedence+, Ans-,
                    LeftOver- ):
*/
afer_prefix_op(Op, Oprec, Aprec, S0, Precedence, _, _) :-
    Precedence < Oprec, !,
    note_syntax_error(['Prefix operator',Op,'in context with precedence',Precedence], S0).

after_prefix_op(Op, Oprec, Aprec, S0, Precedence, Answer, S) :-
    peepop(S0, S1),
    prefix_is_atom(S1, Oprec), % can't cut but would like to
    exprtl(S1, Oprec, Op, Precedence, Answer, S).

after_prefix_op(Op, Oprec, Aprec, S1, Precedence, Answer, S) :-
    vread(S1, Aprec, Arg, S2),
    Term =.. [Op,Arg], !,
    exprtl(S2, Oprec, Term, Precedence, Answer, S).


/*  The next clause fixes a bug concerning "mop dop(1,2)" where
    mop is monadic and dop dyadic with higher Prolog priority.
*/
peepop([atom(F),'('|S1], [atom(F),'('|S1]) :- !.
peepop([atom(F)|S1], [infixop(F,L,P,R)|S1]) :- infixop(F, L, P, R).
peepop([atom(F)|S1], [postfixop(F,L,P)|S1]) :- postfixop(F, L, P).
peepop(S0, S0).


/*  prefix_is_atom( TokenList+, Precedence+ )
        Is true when the right context TokenList of a prefix operator
        of result precedence Precedence forces it to be treated as an
        atom, e.g. (- = X), p(-), [+], and so on.
*/
prefix_is_atomv([Token|_], Precedence) :-                    
    prefix_is_atom(Token, Precedence).

prefix_is_atom(infixop(_,L,_,_), P) :- L >= P.
prefix_is_atom(postfixop(_,L,_), P) :- L >= P.
prefix_is_atom(')', _).
prefix_is_atom(']', _).
prefix_is_atom('}', _).
prefix_is_atom('|', P) :- 254 >= P.
prefix_is_atom(',', P) :- 252 >= P.
prefix_is_atom([],  _).


/*  exprtl0( Tokens+, Term+, Prec+, Answer-, LeftOver- ):
        is called by vread/4 after it has read a primary (the Term).
        It checks for following postfix or infix operators.
*/
exprtl0([atom(F)|S1], Term, Precedence, Answer, S) :-
    ambigop(F, L1, O1, R1, L2, O2), !,
    (   exprtl([infixop(F,L1,O1,R1)|S1], 0, Term, Precedence, Answer, S)
    ;   exprtl([postfixop(F,L2,O2) |S1], 0, Term, Precedence, Answer, S)
    ).
exprtl0([atom(F)|S1], Term, Precedence, Answer, S) :-
    infixop(F, L1, O1, R1), !,
    exprtl([infixop(F,L1,O1,R1)|S1], 0, Term, Precedence, Answer, S).
exprtl0([atom(F)|S1], Term, Precedence, Answer, S) :-
    postfixop(F, L2, O2), !,
    exprtl([postfixop(F,L2,O2) |S1], 0, Term, Precedence, Answer, S).

exprtl0([','|S1], Term, Precedence, Answer, S) :-
    Precedence >= 252, !,
    vread(S1, 252, Next, S2), !,
    exprtl(S2, 252, (Term,Next), Precedence, Answer, S).

exprtl0(['|'|S1], Term, Precedence, Answer, S) :-
    Precedence >= 254, !,
    vread(S1, 254, Next, S2), !,
    exprtl(S2, 254, (Term;Next), Precedence, Answer, S).

exprtl0([Thing|S1], _, _, _, _) :-
    cant_follow_expr(Thing, Culprit), !,
    note_syntax_error([Culprit,'follows expression'], [Thing|S1]).

exprtl0(S, Term, _, Term, S).


/*    cant_follow_expr(+, -),
*/
cant_follow_expr(atom(_),   atom).
cant_follow_expr(var(_,_),  variable).
cant_follow_expr(integer(_),integer).
cant_follow_expr(real(_),   real).
cant_follow_expr(string(_), string).
cant_follow_expr(' (',      bracket).
cant_follow_expr('(',       bracket).
cant_follow_expr('[',       bracket).
cant_follow_expr('{',       bracket).


/*  exprtl(+, +, +, +, -, -),
*/
exprtl([infixop(F,L,O,R)|S1], C, Term, Precedence, Answer, S) :-
    Precedence >= O, C =< L, !,
    vread(S1, R, Other, S2),
    Expr =.. [F,Term,Other], /*!,*/
    exprtl(S2, O, Expr, Precedence, Answer, S).

exprtl([postfixop(F,L,O)|S1], C, Term, Precedence, Answer, S) :-
    Precedence >= O, C =< L, !,
    Expr =.. [F,Term],
    peepop(S1, S2),
    exprtl(S2, O, Expr, Precedence, Answer, S).

exprtl([','|S1], C, Term, Precedence, Answer, S) :-
    Precedence >= 252, C < 252, !,
    vread(S1, 252, Next, S2), /*!,*/
    exprtl(S2, 252, (Term,Next), Precedence, Answer, S).

exprtl(['|'|S1], C, Term, Precedence, Answer, S) :-
    Precedence >= 254, C < 254, !,
    vread(S1, 254, Next, S2), /*!,*/
    exprtl(S2, 254, (Term;Next), Precedence, Answer, S).

exprtl(S, _, Term, _, Term, S).


/*
Syntax errors.
--------------

This business of syntax errors is tricky. When an error is detected, we
have to note how far it was to the end of the input, and for this we are
obliged to use the database, from note_syntax_error.

Before returning, we check to see whether an error was detected, and if
so, call syntax_error(Message,Input,BeforeError) as described in the
specification. By default, this calls default_syntax_error to write out
Input with a marker where the error was noticed.

The really hairy thing is that the original code noted a possible error
and backtracked on, so that what looked at first sight like an error
sometimes turned out to be a wrong decision by the parser. This version
of the parser makes fewer wrong decisions, and my (RO'K) goal was to get
it to do no backtracking at all. This goal has not yet been met, and it
will still occasionally report an error message and then decide that it
is happy with the input after all. Sorry about that.
*/


/*  This is the one that redefines syntax_error.  */
define_syntax_error( P ) :-
    Goal =.. [ P, A, B, C ],
    retractall( syntax_error(_,_,_) ),
    asserta( syntax_error(A,B,C) :- Goal ).


/*  This is called as soon as an error is detected. After is the
    remaining, unparsed, input, and we save its length to indicate
    where the marker should go.
*/
note_syntax_error(Message, After) :-
    length( After, AfterLength ),
    asserta( '$syntax_error'(Message,AfterLength) ),
    !,
    fail.


/*  This is called before returning from the parsing routine, if an
    error had been detected. It picks up the message and length,
    converts the tokens to atoms or numbers so other modules can easily
    write them out, and then calls syntax_error.                    
*/
syntax_error( Input ) :-
    '$syntax_error'( Message, AfterLength ),
    retractall( '$syntax_error'(_,_) ),
    length( Input, InputLength ),
    BeforeLength is InputLength-AfterLength,
    convert_tokens_for_user( Message, 1000000, Message_ ),
    convert_tokens_for_user( Input, BeforeLength, Input_ ),
    syntax_error( Message_, Input_, BeforeLength ),
    !,
    fail.


/*  The default syntax_error.  */
default_syntax_error( Message, Input, BeforeLength ) :-
    nl, write('**'),
    read_term_display_list( Message ), nl,
    read_term_display_list( Input ), nl.


/*  Make sure it is the default.  */
:- define_syntax_error( default_syntax_error ).


/*  convert_tokens_for_user( TL+, Before+, XL- ):
        T       : a list of tokens as represented by the parser. May also
                  contain atoms or numbers, if an error message.                  
        X       : a list of atoms or numbers.
        Before  : number of elements to precede error marker. Make
                  it very large if not wanted.
*/
convert_tokens_for_user( [H|T], 0, ['<<here>> '|T1] ) :-
    !,
    convert_tokens_for_user( [H|T], 1000000, T1 ).

convert_tokens_for_user( [H|T], BeforeError, [H1|T1] ) :-
    !,
    convert_token_for_user( H, H1 ),
    Left is BeforeError-1,
    convert_tokens_for_user( T, Left, T1 ).

convert_tokens_for_user( [], _, [] ).


/*  convert_token_for_user( T+, X- ):
        T: the parser's token representation (or an atom).
        X: an atom or number.
*/
convert_token_for_user( atom(X), X ) :- !.

convert_token_for_user( var(V,X), X ) :- !.

convert_token_for_user( integer(X), X ) :- !.

convert_token_for_user( real(X), X ) :- !.

convert_token_for_user( string(X), X ) :- !.

convert_token_for_user( X, X ).


/*  Simple list-displayer, used by default_syntax_error.  */
read_term_display_list( [] ) :- !.

read_term_display_list( [H|T] ) :-
    write(' '),
    write(H),
    read_term_display_list(T).


/*
Converting token representations.
---------------------------------
*/


/*
convert_tokens( TokensIn+, Dict, TokensOut- ):
    TokensIn are as produced by the tokeniser, either a list of
    tokens, or the atom end_of_file.
    TokensOut will be set to the corresponding token list in the
    parser's representation; Dict will be built up with a list of
    var(Variable,ItsName) pairs.
*/
convert_tokens( end_of_file, _, end_of_file ) :- !.

convert_tokens( [], Dict, [] ) :-
    append( Dict, [], Dict ),
    !.
    /*  Up to this point, Dict always has an uninstantiated tail. We
        fill it in here.
        The cut is after append to stop it backtracking.
    */

convert_tokens( [quoted_atom(A)|Rest], Dict, [atom(A)|Rest_] ) :-
    !,
    convert_tokens( Rest, Dict, Rest_ ).

convert_tokens( [var(N)|Rest], Dict, [var(V,N)|Rest_] ) :-
    !,
    vread_lookup( Dict, var(V,N) ),
    convert_tokens( Rest, Dict, Rest_ ).

convert_tokens( [T|Rest], Dict, [T|Rest_] ) :-
    !,
    convert_tokens( Rest, Dict, Rest_ ).


/*
The variable-name dictionary.
-----------------------------
*/


/*  vread_lookup( Dict, Entry+ ):
        Dict is a list of var(_,_) structures, with tail uninstantiated.
        If Entry is already in Dict, vread_lookup leaves it alone,
        otherwise instantiates its tail to [ X | NewTail ].
*/
vread_lookup( [X|_], X ) :- !.
vread_lookup( [_|T], X ) :-
    vread_lookup( T, X ).


:- endmodule.
