Newsgroups: comp.lang.prolog
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!bloom-beacon.mit.edu!gatech!swrinde!pipex!uunet!allegra!alice!pereira
From: pereira@alta.research.att.com (Fernando Pereira)
Subject: Re: In defense of Prolog's dynamic typing
In-Reply-To: vanroy@dfki.uni-sb.de's message of 24 Nov 1994 10:31:27 GMT
Message-ID: <PEREIRA.94Nov24141742@alta.research.att.com>
Sender: usenet@research.att.com (netnews <9149-80593> 0112740)
Nntp-Posting-Host: alta.research.att.com
Reply-To: pereira@research.att.com
Organization: AT&T Bell Laboratories
References: <3aviu6$7dq@hitchcock.dfki.uni-sb.de>
	<PEREIRA.94Nov23222128@alta.research.att.com>
	<3b1q1v$kqu@hitchcock.dfki.uni-sb.de>
Date: Thu, 24 Nov 1994 19:17:42 GMT
Lines: 156

In article <3b1q1v$kqu@hitchcock.dfki.uni-sb.de> vanroy@dfki.uni-sb.de (Peter Van Roy) writes:
   In article <PEREIRA.94Nov23222128@alta.research.att.com>, pereira@alta.research.att.com (Fernando Pereira) writes:
> |> But in any decent strongly-typed language (eg. SML, Modula-3) you
> |> can define datatypes for the abstract syntax of the language and write
> |> syntax-driven program manipulation programs. Pattern-matching helps in
> |> writing such programs, thus SML is an especially good candidate for
> |> this. The other thing you need is an easily-callable parser for the
> |> language returning abstract syntax, the analogue of Prolog's read/1,
> |> and a way of compiling from abstract syntax analogous to Prolog's
> |> assert/1. That is, you need an open compiler, like the one in the
> |> latest version of SML/NJ. This is the short answer.
> In other words, you've back-patched the language with a second syntax that
> corresponds to the original program syntax.
No I haven't. I'm representing the *abstract* syntax of language L as
L expressions. In languages like Lisp and Prolog this seems
unnecessary, because their concrete syntax in very close to its
representation as expressions. But not close enough. I mentioned the
problem with variables. Another problem for Prolog has been the proper
representation of clause bodies, eg. is a clause body a list of
literals or a tree of formulas.
> One has to open another manual and
> start reading.  How many syntaxes do we need in this world?
Just one. The representation of L abstract syntax in L uses L
alone. The only additional piece of information you need is the
syntactic constructor signature. But you *really* need that
information for Prolog already, because operators like "," disguise
the abstract syntax to some extent.

The real reason why "programs as data" works to some extent in Prolog
and Lisp is that their concrete syntax is so simple
(impoverished?). And even then, writing a syntactic processor for
Prolog that takes into account all the syntactic operators (",". ";",
"->", variables as goals, etc) correctly is sufficiently tricky that
an explicitly-typed abstract syntax might well help.
>  How can such a
> second syntax, in the case of SML, manipulate functions defined with
> pattern-matching in a readable way?
Again, it's not a second syntax, it's just an explicit abstract syntax
represented as SML expressions. Now you might argue that even the base
syntax of SML is too complex for this kind of processing (I'm agnostic
on the point), but that's a different question from the one under
discussion, the advantages of strong typing for meta-programming.
> |> The longer answer is that I've never seen or written myself more
> |> obscure and buggy programs than those that manipulate Prolog terms
> |> intended to represent Prolog programs.
> My experience is different.  For example, I've written diagnostics for testing a
> microprocessor.  There are so many combinations of instructions to check, that I
> wrote a simple Prolog program to generate the diagnostics for me.  This Prolog
> program is written using a generalization of DCG's, which renders the source code
> very simple.
One swallow doesn't a Spring make. Your program or the problem you
worked on might be an exception. I've certainly seen 1000s of lines
of Prolog code for program manipulation rife with obscurity and
danger, even written by acknowledged masters of the craft.
> |> Even the trivial problem of translating DCG
> |> rules to Prolog has several of those pitfalls, and almost all such
> |> translators are or have been buggy. 
> Well actually, almost all programs I have ever seen, except for Hello World, are
> or have been buggy :-).  Unix utilities are notorious for having bugs (e.g.,
> what's the maximum number of characters in a csh command line?).  I've been using
> DCG's for years now, and I have come across exactly one bug (how to translate 
> 'p --> !').
I've seen *many* other DCG translation bugs. Richard O'Keefe probably
has a long repository of such. As for Unix utility bugs, what's their
relevance to the point under discussion, strong typing?
> |> Sure, the identification of terms and programs is seductive, and leads
> |> to very concise hacks. But it leads to unmaintainable, unsafe code.
> On the contrary, keeping source code simple by hiding bookkeeping through a
> preprocessor _improves_ maintainability and safety.  (See the generalized DCG's
> mentioned above.)
Preprocessors like DCGs and your generalized DCGs are very convenient
when they fit the problem. (Incidentally, for a chuckle check out the great
monad fashion in functional programming. They seem to have discovered
difference lists/accumulators after all these years.) But the crucial
point is that bugs in recursive decomposition of abstract syntax are
more easily caught in a strongly-typed setting.
> |> And now for a final heresy. I've written lots of Prolog, and lots of
> |> C. Writing the Prolog code has often been more gratifying, because
> |> something is up and running much sooner. But the C programs tend to
> |> have far fewer runtime bugs once they compile and link, because even
> |> the lame C type system catches many type bugs that Prolog lets by.
> This is true up to a point (it depends what kind of runtime bugs you look at). 
> But it's a very one-sided picture.  For example, since Prolog is usually
> incrementally compiled with an interactive top level, bugs are easier to catch
> _immediately_, rather than having to write 1000's of lines before one can even
> start the chase.
You are not speaking to the point. I wasn't comparing Prolog with C
*as whole languages*. I was pointing out the advantages of strong
typing in program development. The fact is that even with all the
horrible unsafeties of C, its strong typing helped me catch the great
majority of errors before running the program. Of course incremental
program development is better, which you can do as well in
strongly-typed languages like SML as in Prolog. The point is that
*many* runtime bugs in Prolog (or Lisp) programs would never have
happened in a strongly-typed setting, because they would have been
caught by the type checker. Even with incremental development,
debugging at runtime is less reliable and more laborious than having
a type-checker help. 

Think of it another way. Type-cheking is cheap, effective (if partial)
automatic program verification. How could one want to reject such
convenient help? We could as well reject compilers and go back to
assembler: after all even the most machine-oriented language, eg. C,
stops you from doing things that you could do in assembler.
> C requires a strong programmer discipline to avoid dangling pointers, memory
> leaks, array range errors, and so forth.  Writing C programs without arbitrary
> limits (because of static array sizes) requires going outside of C (using malloc
> and friends), which introduces its own set of problems.
Wait a minute! What does this have to do with strong typing? I'm
baffled that you would think I was defending C as a whole. But since
you clearly did, here's a true story. I have been involved over
the last 18 months in a project to build a set of modular components
for speech and language processing based on weighted finite-state
transducers. This involves implementing (sometimes subtle)
generalizations of the standard regular operations on DFAs, transducer
composition, determinization and minimization algorithms, various
search algorithms, and many utilities. The whole package is written in
C. It could not have been written in Prolog, both because it depends
on imperative automata algorithms without known efficient declarative
versions, and because the applications of the package have stringent
time and space requirements. It consists of around 5k lines of C,
written by two people. The great majority of bugs we have detected are
of two kinds: failure to deal correctly with boundary conditions
(eg. empty or disconnected transducers) and algorithm errors,
especially in the ordering of steps in imperative algorithms. The
first type of bug is common in Prolog. The second is not (at least for
non-imperative Prolog code), but that's hardly an advantage for
Prolog, given that efficient Prolog versions of those algorithms do
not exist. As for dangling pointer, array overflow, etc. bugs, I can
remember very few. But we paid a price, in code complexity to ensure
that memory is preperly allocated and deallocated. If and when SML (or
Modula-3, or...) is available with competitive performance for the
platforms we use, we would much prefer to use them over C, so that we
don't have to worry about memory management. But I cannot
think of a single reason for preferring to develop this code in an
untyped language.
> Prolog requires another kind of programmer discipline: casting your problem into
> a logical mold.
There are many logics. Prolog gives you one. But what if the logic is
my problem is, say, higher-order logic or linear logic? There are problems
that fit Prolog. By all means use Prolog then. I certainly do. But
sweeping claims for Prolog's "logical mold" seem to make "logical" a
cousin of "magical". Magical programming. Now that's the ticket!
> The standard argument, which I agree with, is that since discipline is needed
> anyway, you should take Prolog's discipline rather than C's, since it is better
> for you (it improves your thinking).
It does, but so does Latin or number theory. Different problems need
different kinds of thinking. Tools encourage good thinking for
problems that suit them, but blind adherence to a single tool is a
source of much poor thinking. 
--
Fernando Pereira
2D-447, AT&T Bell Laboratories
600 Mountain Ave, PO Box 636
Murray Hill, NJ 07974-0636
pereira@research.att.com
