/*  PARSE_LOGIC.PL  */


:- module parse_logic.


:- public logic_fact_in/4,
          logic_question_in/4.


/*
SPECIFICATION
-------------

This module exports logic_fact_in/4, a DCG predicate for parsing facts
expressed in 'Logic', such as
    john loves mary if mary loves john
It also exports logic_question_in/4, a similar DCG predicate for parsing
questions.


PUBLIC DCG logic_fact_in( Tree-, Error- ):
PUBLIC DCG logic_question_in( Tree-, Error- ):
-------------------------------------------

"Tree and Error are the result of parsing a list of tokens representing
a fact/question in Logic.

If the tokens represent a syntactically valid fact/question,
Tree will be unified with a 'parse tree'. This will contain the term
itself, and a dictionary of variable names, acessible by the selector
predicates defined in EDITOR.PL. In this case, Error will be 'ok'.

Otherwise, Tree will be undefined and Error will be a representation of
the error.
*/


/*
IMPLEMENTATION
--------------

The grammar, and error diagnosis
--------------------------------

There are three clauses for logic_fact_in. The first one pre-checks the
tokens for things that are errors but that would not be noticed by the
parser (e.g. using => instead of >=).

If the pre-check fails, the second clause takes over and calls the DCG
rule 'logic_rule_in'. If this succeeds, the fact was syntactically
correct, and Clause will contain a Prolog translation of the fact.
'check_clause' is then called to look for errors that are more easily
detected in a clause rather than a list of tokens. One error that's
easily detectable this way is that the fact redefines a built-in
predicate.

If the second clause fails, we know that the token list is incorrect and
we try to determine what the error was (we know that novice users
repeatedly make certain kinds of error).


The DCG rule 'logic_rule_in' defines a correct logic fact, and is
straightforwardly defined in terms of its parts. logic_question_in works
in exactly the same way as logic_fact_in.


In the code, I have used the DCG construction ( A,B ; C ) instead
of the clearer ( A -> B ; C ). I am not sure that all DCG translators
can handle the latter. It appears to be necessary to insert a "rem"
to ensure the translator thinks that all the input has been swallowed.


PARSE_UTILS defines a number of DCG predicates used in this file.


Variables
---------

Most of the rules have two extra arguments, typically called Vars0 and
Vars. These are used in building a dictionary of variable names. The
idea is that the fact
    Y loves Z if Z listens_to X and X is_by dire_straits
will be translated into a Prolog clause:
    loves( _1, _2 ) :- listens_to( _2, _3 ), is_by( _3, dire_straits )
where _1, _2, _3 indicate the variables introduced to stand for Y,Z,X.

When answering questions, we want to be able to give solutions in terms
of the user's original variable names:
    Yes -
    Y = john, Z = mary, X = money_for_nothing
So on parsing the fact, we build a list of name/variable associations:
    [ var("Z",_1), var("Y",_2), var("X",_3) ]
and carry it around with the clause.

Variables are gathered by 'logic_argument_in':
    logic_argument_in( W, Vars0, Vars ) -->
        [ var(Name) ], !,
        lookup_var( Name, Var, Vars0, Vars ).
which, if it finds something signifying a variable named Name, looks it
up in the list Vars0 of variables so far, and adds a new entry to that
list if Name hasn't been encountered before.


Non-English languages.
----------------------

There's no reason why the Tutor should not be adapted to other languages.
In most cases, that would just entail changing the words used for 'if'
and 'and'. However, the Germanic languages pose a problem due to verb
movement. For example, the English sentence
    I will tell him if I see him
translates into Dutch as
    Ik zal hem zeggen, als ik hem zie
where the verb in the if-part (zie) moves to the end of the clause
because of the 'if'. (The verb for 'tell' also moves to the end because
it's qualified by an auxiliary, but that's unimportant here.) German
behaves similarly, and I think the Scandinavian languages also move the
verb, though not quite in the same way.

Similarly, the sentence
    X is a father if X is a parent of Y
would become
    X is een vader als X een ouder van Y is

This would be easy to handle if all the predicate names were just verbs,
since we can easily alter the grammer of Logic to say that, in the tail
of a clause, a goal is <argument1> <argument2> <verb>. Though this is
one possible solution, it rules out equivalents of many of the predicate
names we'd use in English such as is_a and is_on.

Another possibility is for the parser to recognise verbs at the end of
the 'if' clause, and combine them with the rest of the predicate name
Thus
    <tail-goal> ::= <argument1> <argument2> <verb>
                |   <argument1> <part> <argument2> <verb>
This could be done if you assume that the final word of a four-word
tail-goal must be a verb. I don't know how many mistakes students would
make with this, but it seems OK for most of the common verbs I've
thought of. Incidentally, it is needed for our most common example,
since
    X loves Y
becomes
    X houdt van Y

Is it possible to avoid doing either of these? Although I know some
Dutch, it's impossible for me to tell how odd a native speaker would
find a sentence where the verb is misplaced. I was told by Rob Kemmeren,
who is native Dutch, that a sentence such as
    X is een vader als X is een ouder van Y
does indeed look and sound very odd, sufficiently so as to hinder
comprehension. However, if you think of it as a mathematical equation,
it seems more normal. Thus a sentence such as
    X toegelaten_in america als             (... allowed into ...)
        X geboren_in Y en                   (... born in ... )
        Y =< 1900
immediately looks like an equation because of the =< and the underlines,
so the reader is not put off by the odd placement of 'geboren'.

This then is the third possible solution. Emphasise from the start, more
strongly than I do, that the Tutor doesn't know your natural language
and that what it understands is just a special kind of equation. Enhance
this impression by liberal use of underlines, and perhaps by introducing
the comparison operators earlier than I do.
*/


:- needs
    arb / 2,
    check_clause / 2,
    check_goal / 2,
    fact_vs_clause_and_vars / 3,
    fact_vs_text / 2,
    lookahead_rem / 3,
    member / 2,
    question_vs_goal_and_vars / 3,
    question_vs_text / 2,
    rem / 2,
    rem / 3.


/*  Top level.  */

logic_fact_in( Fact, Error ) -->
    lookahead_rem( Tokens ),
    pre_diagnose_logic_fact( Error ),
    !,
    { fact_vs_text( Fact, Tokens ) }.

logic_fact_in( Fact, Error ) -->
    lookahead_rem( Tokens ),
    logic_rule_in( Clause, [], Vars, Error0 ),
    !,
    {
        (
            Error0 \= ok
        ->
            Error = Error0
        ;
            ( check_clause( Clause, Error ) ; Error = ok )
        ),
        (
            Error \= ok
        ->
            fact_vs_text( Fact, Tokens )
        ;
            fact_vs_clause_and_vars( Fact, Clause, Vars )
        )
    }.

logic_fact_in( Fact, Error ) -->
    lookahead_rem( Tokens ),
    post_diagnose_logic_fact( Error ),
    { fact_vs_text( Fact, Tokens ) }.


logic_question_in( Question, Error ) -->
    lookahead_rem( Tokens ),
    pre_diagnose_logic_question( Error ),
    !,
    { question_vs_text( Question, Tokens ) }.

logic_question_in( Question, Error ) -->
    lookahead_rem( Tokens ),
    logic_tail_in( Goal, [], Vars, Error0 ),
    !,
    {
        (
            Error0 \= ok
        ->
            Error = Error0
        ;
            ( check_goal( Goal, Error ) ; Error = ok )
        ),
        (
            Error \= ok
        ->
            question_vs_text( Question, Tokens )
        ;
            question_vs_goal_and_vars( Question, Goal, Vars ) 
        )
    }.

logic_question_in( Question, Error ) -->
    lookahead_rem( Tokens ),
    post_diagnose_logic_question( Error ),
    { question_vs_text( Question, Tokens ) }.


/*  Parsing correct sentences.  */

logic_rule_in( Rule, Vars0, Vars, E ) -->
    logic_head_in( H, Vars0, Vars1, E0 ),
    (
        { E0 \= ok },
    /* THEN */
        { E = E0 },
        rem
    ;
        [atom(if)],
    /* THEN */
        logic_tail_in( T, Vars1, Vars, E ),
        { Rule = (H:-T) }
    ;
        { Vars = Vars1, Rule = H, E = E0 }
    ), !.


logic_head_in( G, Vars0, Vars, E ) -->
    logic_goal_in( G, Vars0, Vars, E ).


logic_tail_in( Goal, Vars0, Vars, E ) -->
    logic_goal_in( G1, Vars0, Vars1, E0 ),
    (
        { E0 \= ok },
    /* THEN */
        {E = E0},
        rem                   
    ;
        [atom(and)],
    /* THEN */
        logic_tail_in( G2, Vars1, Vars, E1 ),
        { Goal = (G1,G2), E = E1 }
    ;
        { Goal = G1, Vars = Vars1, E=E0 }
    ), !.


logic_goal_in( G, Vars0, Vars, E ) -->
    logic_argument_in( V1, Vars0, Vars1 ),
    logic_predicate_in( P, E ),
    logic_argument_in( V2, Vars1, Vars ),
    { ( E = ok -> G =.. [ P, V1, V2 ] ; true ) }.


logic_argument_in( I, Vars, Vars ) -->
    [ integer(I) ], !.

logic_argument_in( I, Vars, Vars ) -->
    [ real(I) ], !.

logic_argument_in( W, Vars, Vars ) -->
    [atom(W)], {not(logic_connective( W ))}, !.

logic_argument_in( W, Vars, Vars ) -->
    [quoted_atom(W)], !.

logic_argument_in( Var, Vars0, Vars ) -->
    [ var(Name) ], !,
    { lookup_var( Name, Var, Vars0, Vars ) }.


logic_predicate_in( W, connective_for_predicate/W ) -->
    logic_connective_in( W ), !.

logic_predicate_in( W, variable_for_predicate/W ) -->
    [ var(W) ], !.

logic_predicate_in( W, non_atom_for_predicate/W ) -->
    [ W ], { W \= atom(_) }.

logic_predicate_in( W, ok ) -->
    [atom(W)], {not(logic_connective( W ))}.


logic_non_connective_in( W ) -->
    logic_connective_in( W ),
    { !, fail }.

logic_non_connective_in( W ) -->
    [W].


logic_connective_in( W ) -->
    [atom(W)], { logic_connective(W) }.


logic_connective( and ).
logic_connective( if ).


/*  Diagnosing errors.  */

pre_diagnose_logic_fact( D ) -->
    pre_diagnose_either( D ).


post_diagnose_logic_fact( and_if_in_definition ) -->
    arb, [atom(if)], arb, [atom(and)], [atom(if)], !, rem.
/*  This mistake is quite common. */

post_diagnose_logic_fact( if_in_definition ) -->
    arb, [atom(if)], arb, [atom(and)], arb, [atom(if)], !, rem.
/*  I'm not sure whether I've ever seen students type an 'and',
    some stuff, and _then_ another 'if', though I have seen them
    type an 'and if', as in the rule above. But it seemed worth
    putting this rule in.

    Note that it gives the same error message as the rule below. But
    if we discover that they arise under different circumstances,
    it would be sensible to give different messages for each.
*/

post_diagnose_logic_fact( if_in_definition ) -->
    arb, [atom(if)], arb, [atom(if)], !, rem.

post_diagnose_logic_fact( fact_too_short ) -->
    ([_] ; [_,_]), !.

post_diagnose_logic_fact( cond_fact_too_long ) -->
    logic_non_connective_in(_), logic_non_connective_in(_),
    logic_non_connective_in(_), logic_non_connective_in(_),
    arb, [atom(if)], !, rem.

post_diagnose_logic_fact( uncond_fact_too_long([Tok1,Tok2,Tok3,Tok4|Rest]) ) -->
    logic_non_connective_in(Tok1), logic_non_connective_in(Tok2),
    logic_non_connective_in(Tok3), logic_non_connective_in(Tok4),
    !, rem(Rest).

post_diagnose_logic_fact( fact_and_fact ) -->
    logic_rule_in(_,_,_,_), [atom(and)], !, logic_rule_in(_,_,_,_).

post_diagnose_logic_fact( unrecognised_error_in_fact ) -->
    rem.


pre_diagnose_logic_question( D ) -->
    pre_diagnose_either( D ).


pre_diagnose_either( bad_comparator ) -->
    arb, ( [atom(=>)] ; [atom(<=)] ), !, rem.

pre_diagnose_either( prolog_in_logic ) -->
    arb, ( ['('] ; [')'] ; [atom(':-')] ; [','] ; [atom(';')] ), !, rem.


post_diagnose_logic_question( if_in_question ) -->
    arb, [atom(if)], !, rem.

post_diagnose_logic_question( unrecognised_error_in_question ) -->
    rem.                    


/*  Utilities.  */

lookup_var( Name, Var, Vars, Vars ) :-
    member( var(Var,Name), Vars ),
    !.
lookup_var( Name, Var, Vars, [ var(Var,Name) | Vars ] ).


:- endmodule.
