Newsgroups: comp.lang.prolog
Path: cantaloupe.srv.cs.cmu.edu!nntp.club.cc.cmu.edu!newsfeed.pitt.edu!gatech!swiss.ans.net!newsgate.watson.ibm.com!hawnews.watson.ibm.com!syllog1!reintjes
From: reintjes@watson.ibm.com (Peter B. Reintjes)
Subject: Re: Turbo Prolog v2.0?
Sender: news@hawnews.watson.ibm.com (NNTP News Poster)
Message-ID: <D1uFrn.2oA3@hawnews.watson.ibm.com>
Date: Tue, 3 Jan 1995 19:06:58 GMT
Disclaimer: This posting represents the poster's views, not necessarily those of IBM.
Nntp-Posting-Host: syllog1.watson.ibm.com
Organization: IBM T.J. Watson Research
Lines: 111


I knew this would come back to haunt me...

> Strong typing means that VP produces probably fastest
> Prolog-code around. According to our first impressions,
> VP is even faster than MS C/C++ & Borland C++. I am talking now about
competing against C++ mafia/dicease, not against to other Prolog systems.
> BTW, Peter Reintjes have interesting results. He claims, that addition
> operation is
>
>      O( 1 )       with Prolog
>      O( sqrt(N) ) with C++
>
>(if I understood it right in PAP94...) He explained that this means,
> that with faster prosessors Prolog code becomes faster than C++ code
> (because C++ is all about to playing with pointers??)
>(Peter, could you explain this thing clearly to rest of us, please...)

In the PAP talk, I was only talking about the case of increment
and decrement, which are used quite a bit in C (I don't remember 
saying anything about C++).  I pointed out that as (RISC)
processor cycle type drops below 1ns, and word size
increases (64-bit integers) addition becomes a rather expensive
operation (sqrt(sizeof(int)) for standard carry lookahead architecture),
and a pointer de-reference *could* be cheaper (I'm imaginging
something between large register array and normal caching).

All of this came about after I substituted successor arithmetic
for (NN is N + 1) in a matrix manipulation algorithm and got
a tremendous speedup.  The reason I got the speedup can be
understood by looking at the WAM and the fact that numbers
(with their tags) are so expensive in Prolog -- in other words,
I *did not* think that this particular speedup was evidence
for the idea stated above, but it got me thinking about
these operations at a much lower level...

(I should probably point out that I'm talking about something
below the usual discussions of operation times, when I talk
about the number of operations required, I'm talking about 
actual gate delays, not instructions. One way to look at it
is that there is another layer of RISC below the current
notion of RISC, and at this new level we consider addition
to be a "complex" instruction).

In fact, the most expensive operation is really random access memory
references which are log(sizeof Memory) == 64 for 64-bit addresses.
However, with luck, your successor structure s(s(s(0))) will
be stored efficiently (and all together) so that a Burst Ram
technology (where the log(Mem) address is propogated once and
some large chunk of memory (let's say 1Kb) comes blasting into
the cache in 1K time units. If you use half of those values, you
only paid (64+1024)/512 a little over 2 gate delays for each value.

My observation at PAP94 was that since Prolog code can be transformed,
using DCG-like formalism and Tarau's binary clauses, into
something like:

p(s(A),X,Y,M,N,O,P,Q,R,S) :-
   ...built ins...
   q(A,X,Y,M,N,O,P,Q,R,S1).

where most of the arguments don't change and are therefore free.
This code would map very nicely onto a fairly simple kind of
machine (with lots of registers). And with Prolog's already
low procedure-call overhead, symbolic processing may not be
slower than "numerical" processing.  Equality tests and
unification with (at least) one variable are unit time operations.

The point is not that all of this couldn't as well be done
in C (build successor structures out of bits and use them rather
than ++, after all shift and test is only 2 gate delays), but
that we can continue to program in a high-level language
without necessarily paying a high cost.

Of course, even if any given three lines of code were always faster
in C than in Prolog, it wouldn't mean that could couldn't build a
faster application in Prolog.  The ability to handle complexity
*will* pay off some day when we are through writing all these
interactive programs that seem to be little more than video games
designed to keep people busy.  My point here was that even at the
small scale, Prolog is not inherently slower and may actually point
the way to faster operations.

Summary:

At sub-nanosecond cycle times, full-scale addition may be
overkill for increment and decrement. Because burst
DRAM technology can greatly reduce the really expensive
part of successor structures (memory references) and
Term Compression(*) can improve locality of successor
structures, a symbolic computation step may be cheaper
than 64-bit addition.

(*) A Novel Term Compression Scheme and Data Representation
    in the BinWAM, Tarau and Neumerkle  (FTP: clement.info.umoncton.ca)


         -  Peter Reintjes        Internet: reintjes@watson.ibm.com


IBM T.J.Watson Research Center,             157 Orchard Road, Apt 3D
P.O.Box 704                                 Briarcliff Manor, NY 10510
Yorktown Heights, New York 10598

Tel: (914) 784 7318                         Tel: (914) 923-4550

WWW External:       http://www.watson.ibm.com/watson/logicpgm/
WWW Internal(IBM):  http://www-i.almaden.ibm.com/watson/logicpgm/



