Newsgroups: comp.lang.scheme
Path: cantaloupe.srv.cs.cmu.edu!bb3.andrew.cmu.edu!nntp.sei.cmu.edu!news.cis.ohio-state.edu!math.ohio-state.edu!cs.utexas.edu!swrinde!news.sgi.com!news.msfc.nasa.gov!newsfeed.internetmci.com!in3.uu.net!news.biu.ac.il!discus.technion.ac.il!news!qobi
From: qobi@eesun.technion.ac.il (Jeffrey Mark Siskind)
Subject: Re: comp.lang.ML -- for all your religious needs...
Reply-To: Qobi@EE.Technion.AC.IL
Organization: Technion, Israel Institute of Technology
Date: Mon, 12 Aug 1996 08:03:20 GMT
Message-ID: <QOBI.96Aug12110320@eesun.technion.ac.il>
In-Reply-To: Greg Morrisett's message of Tue, 06 Aug 1996 10:52:59 -0400
X-Nntp-Posting-Host: eesun.technion.ac.il
References: <3206E88A.647F@sonic.net> <32075C4B.394E@cs.cornell.edu>
Sender: news@discus.technion.ac.il (News system)
Lines: 106

In article <32075C4B.394E@cs.cornell.edu> Greg Morrisett <jgm@cs.cornell.edu> writes:

   We pointed out that 
   instead of adding multiple return values (and other hacks to support 
   other optimizations) it was sufficient to add static typing a la ML.

I don't understand this. How is H-M type system necessary and sufficient to
support the unboxing of return structures into multiple return values? To
soundly unbox a structure across a procedure return you need to know that

a.  there can be only one reference to that structure throughout its lifetime,
    or
b1. there can be no side effects to the slots of that structure, and
b2. there can be no eq? checks between that structure and a potential copy.

H-M itself doesn't provide such analysis. It might very well be the case that
ML compilers like SML/NJ and TIL do perform such analysis but that is
orthogonal to the H-M type system and type inference algorithm. And it could
be performed by compilers for other languages that use different static
analysis techniques.

   In subsequent threads, it was argued that it is straightforward to
   translate scheme into ML.  In fact, a couple of years ago, I posted
   some ML code that automatically did this for a subset of scheme

I don't understand what this has to do with the topic under discussion. About
20 years ago I developed an automatic local transliteration from Lisp to
Algol-68. Does this make Algol-68 strictly more expressive than Lisp?

   I think the key point is that very high-performance scheme compilers
   of the future will be taking this model:  translate from the
   dynamically typed language to a statically typed language, 
   and then use a compiler for this statically typed language to
   produce efficient code.  (This is effectively what soft scheme
   does now!)

Unless the compiler is buggy, what the compiler does internally does not
change the semantics of the source language. I don't understand, are you
applying the static/dynamic type distinction to the language semantics or to
the implementation?

For at least 25 years, Lisp compilers have been doing static type inference to
eliminate run time tag representation and checking. I believe that Maclisp did
type inference as early as 1968. It is true that compilers today, like SML/NJ
and TIL, are much more effective than earlier compilers, but many Lisp
compilers, like Lucid and Python, have done some type inference that lands
somewhere in between Maclisp and TIL. I agree with you that static inference
in important. But I disagree that static inference is all type inference.
Static inference annoates the source program with assertions. You can call
this a translation form a dynamically typed language to a statically typed
language but I thing that that is a perversion of those notions. Type
decorations are only `one small corner' of the space of assertions that a
compiler can infer and use to generate good code. And there already are
Lisp/Scheme compilers that automatically infer lots of useful properties about
programs, besides their types, that help generate efficient code.

   So, if, like me, you're an efficiency bigot of sorts, you don't
   want to be limited by the source language -- you'd like to be
   able to open up the hood of the compiler and hack directly in
   its internal language to ensure that you're not paying the price
   of those extra tags on values and the extra tag-checking.  How
   do you do this?  Program directly in the statically typed language.

According to this philosophy you should program directly in a language that
allows you to explicitly specify things like:

a. tag representation
b. which variables are local vs. global
c. register allocation
d. instruction scheduling
e. cache line allocation, prefetching, and flushing
f. memory page allocation, vm prefetching and flushing

   On top of this, programming in a statically typed language gives
   you facilities for transparent interoperability with legacy languages
   (a la C, Fortran, C++).  Why?  No need to tag values at run time
   to support (surprise!) dynamic type checking.  I currently have
   an implementation of the TIL/ML compiler that translates directly
   to C using C's native data structures in a completely straightforward
   way.  That is, ML ints are represented as C ints (no tag bits), ML 
   floats are represented as C floats, ML records are represented as 
   pointers to C structs, etc.  So, communicating from ML <-> C code is not
   only painless, but also efficient because I don't have to traverse
   a data structure to strip off tags and remove unwarranted boxing.

ML does not have a monopoly on this. Stalin, a compiler for Scheme, has been
freely available from my home page for two and a half years. Stalin gives
transparent interoperability with legacy languages like C. There are no tag
values at run time for monomorphic data. Stalin translates directly to C using
C's native data structures in a completely straighforward way. That is Scheme
exact integers are represented as C ints (no tag bits when the data is
monomorphic), Scheme inexact reals are represented as C floats, Scheme
structures (defined with DEFINE-STRUCTURE) are represented as pointers to C
structs, etc. So communicating from Scheme <-> C code is not only painless,
but also efficient because I don't have to traverse a data structure to strip
off tags and remove unwarranted boxing.

   Now, building an mlsh on top of this compiler would be far easier
   and more efficient than having to use the Scheme 48 C interface
   for precisely these reasons.

Now, building scsh on top of this compiler would be far easier and more elegant
than having to redesign scsh to use ML instead of Scheme.
-- 

    Jeff (home page http://tochna.technion.ac.il/~qobi)
