Newsgroups: comp.lang.prolog
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!news.mathworks.com!news.alpha.net!uwm.edu!msunews!harbinger.cc.monash.edu.au!bunyip.cc.uq.oz.au!munnari.oz.au!cs.mu.OZ.AU!munta.cs.mu.OZ.AU!fjh
From: fjh@munta.cs.mu.OZ.AU (Fergus Henderson)
Subject: Re: Prolog syntax
Message-ID: <9511918.22753@mulga.cs.mu.OZ.AU>
Sender: news@cs.mu.OZ.AU (CS-Usenet)
Organization: Computer Science, University of Melbourne, Australia
References: <9511221.23905@mulga.cs.mu.OZ.AU> <3nkjnc$q0o@goanna.cs.rmit.edu.au>
Date: Sat, 29 Apr 1995 08:44:40 GMT
Lines: 192

ok@goanna.cs.rmit.edu.au (Richard A. O'Keefe) writes:

>fjh@munta.cs.mu.OZ.AU (Fergus Henderson) writes:
>
>>Should the following be syntax errors?
>
>Assume the DEC-10 Prolog operator declarations.
>
>>	?- display(* + *).		% 1
>
>No.  The two asterisks are quite unambiguously *atoms*, whatever the
>operator declarations.  It is highly unreasonable to reject unambiguous
>input which makes sense.  I note that C Prolog (using a *fast* deterministic
>left corner parser) had absolutely no difficulty with this.

I was of course not suggesting that such syntax was difficult for computers
to parse.  However, I do think that syntax like that is difficult for
humans to parse.  I think it would be much clearer if it was written either as

	?- display('*' + '*').

as I have suggested, or

	?- display((*) + (*)).

as required by ISO Prolog.

Ideally, syntax should be easy to parse for both computers and humans.

(P.S. I don't see how the efficiency of C Prolog's parser has any relevence.)

>>	?- display(1 '+' 2).		% 2
>
>No.  (I don't care what the standard says.) 

As it happens the standard agrees with you here.

>Here we have the pattern
>	(  <literal>  <word>   <literal>  )
>	(  OPERAND    OPERATOR OPERAND    )
>is the only reading that makes sense.  (There is absolutely no need for
>any distinction between + and '+')

Distinguishing between + and '+' is not _necessary_, but I think it might
produce a syntax that was easier for humans to understand.  Humans do
notice such distinctions.

>This ought to be the same as 3.  There is neither need nor excuse for
>discriminating between quoted and unquoted operators.  (In particular,
>if operators cannot be quoted, then there are some atoms which cannot
>be operators,

It is true that requiring operators to be unquoted would mean that
there are some atoms which cannot be operators.  I don't think this is
a big loss, however, since I have never seen a Prolog program which
used quoted atoms as operators, and I have trouble imagining a
situation in which doing so would improve readability.  Now it may
be that there are some such situations that I don't know about.
If there are, please let me know.

>and precisely what that set is depends on the current
>read table, which it shouldn't.)

I don't understand what you are talking about here.
What do you mean by "the current read table"?

>>According to my reading of the draft ISO Prolog standard,
>>1, 3, and 4 are supposed to be syntax errors.
>
>The standard has many unnecessary changes.  This particular one is because
>they don't know how to write efficient Prolog parsers.  That's odd, because
>it isn't very hard.

Are you sure that efficiency was the motive?

>The real point at
>issue is not whether * + * is allowed, but whether terms like
>	insert/delete
>are allowed.  Why would that be in doubt?  Well, NU Prolog adds a heck
>of a lot of ordinary looking words as operators.  The "don't gratuitously
>reject it if it is unambiguous" rule means that in many cases terms which
>are valid *stay* valid even when someone else adds operators you weren't
>expecting.

The fact that adding operators can cause syntactically valid terms
to become invalid is indeed a problem.  This problem can occur
with the "de facto standard" Prolog syntax, with ISO Prolog standard
syntax, and also with my suggested syntax.

With 
		X = 'op'

However, I think the right way to solve this problem is with appropriate
use of a module system that handled operators.

>> since for common errors such as
>>	( X = 0 ->
>>		foo,
>>		bar,	% oops, extra comma
>>	;
>>		baz
>>	).
>
>>the first incorrect token is not the extraneous comma or even the semicolon,
>>it is at `baz', two lines past the actual cause of the error.
>
>You call it a "common" error.

I once spent some time recording the exact cause of every error I made.
I recorded 155 errors, of which 16 were syntax errors:

	- additional `,' (6 times)
	- additional `.' (1 times)
	- missing ',' (4 times)
	- ',' instead of '.' (1 times)
	- unbalanced parenthesis (1 times)
	- miscellaneous other syntax error (3 times)

So 6/16 (nearly 40%) of the syntax errors were "additional comma"
errors.  At a rough guess, I'd expect about half of those (i.e. nearly
20% of all syntax errors) to be caused by an additional comma before an
operator like `->' or `;'.  (The other half would be before a closing
bracket, closing brace, etc.).

Obviously this is a small sample, and no doubt the sorts of errors
made vary significantly from programmer to programmer.  But I don't
think the estimate is that far off.

>There is *NO* programming language where you are
>guaranteed that every error will be detected at exactly what you think of
>as the right token.

Nevertheless we would like to do as good a job as possible, subject
to the constraints imposed by conflicting goals.

>>Originally Mercury syntax was going to be based on ISO Prolog syntax,
>>since I had a clear specification of ISO Prolog syntax and I didn't
>>seen any need to reinvent the wheel.  But it seems that
>>ISO Prolog syntax and "defacto standard Prolog syntax" disagree anyway,
>>and neither of them is much good.
>
>I see; you don't like "de facto" syntax because it is consistent and
>comparatively safe in the presence of operator changes.  Instead of
>calling it "not much good" because it has different design goals from
>yours, why not spell out exactly what your design goals are?

I didn't mean to say that Prolog syntax wasn't much good, although
I see how my words could be interpreted as such - I meant to say
that it didn't _seem_ to be much good (that's the trouble with
ambiguous natural-language grammars like English ;-).
The fact that it didn't seem much good was a little suprising to me,
since I think the people who designed it were no fools.  The reason I
posted to comp.lang.prolog was precisely to find out whether there was
any reasonable justification for the particular design decisions
taken, and if so what that justification was.

Two justifications that I consider reasonable have been mentioned:

	(1) Requiring operators to be unquoted names would prevent
	    users from making certain atoms into operators.

	(2) Allowing unquoted/unbracketed operators as operands
	    means that code which uses an atom as an operand
	    doesn't break if that atom is later made an operator
	    (or rather, doesn't _always_ break; there are still
	    some circumstances in which it would break).

However, (1) does does not seem like such a great loss,
and (2) would be much less of a problem in a language with
a (standardized) module system.

>Many years ago, Bill Clocksin said that if he were designing Prolog
>from scratch, he wouldn't have new operator declarations at all.  I
>agreed with him.  I strongly suspect that the only way you will get
>a syntax that meets your (as yet unclarified) goals is by banning
>new operators.  That would be a reasonable thing to do.

This is an issue on which reasonable people can (and often do!) disagree.
Personally I think user-defined operators are a good thing.

As for my goals... I wanted Mercury syntax to be close enough to
Prolog syntax to look familiar to Prolog programmers, and to have
enough in common with Prolog syntax for us to write the Mercury
compiler in the intersection of Mercury and Prolog so that we could
bootstrap using Prolog.  Apart from that, well, essentially just the
usual software engineering sorts of things: we want a language that
will maximize our productivity.

-- 
Fergus Henderson
fjh@cs.mu.oz.au
http://www.cs.mu.oz.au/~fjh
