Newsgroups: comp.lang.scheme
Path: cantaloupe.srv.cs.cmu.edu!bb3.andrew.cmu.edu!newsfeed.pitt.edu!scramble.lm.com!news.math.psu.edu!news.cse.psu.edu!uwm.edu!math.ohio-state.edu!cs.utexas.edu!swrinde!tank.news.pipex.net!pipex!usenet2.news.uk.psi.net!uknet!usenet1.news.uk.psi.net!uknet!psinntp!psinntp!psinntp!news.biu.ac.il!discus.technion.ac.il!news!qobi
From: qobi@eesun.technion.ac.il (Jeffrey Mark Siskind)
Subject: multiple values
Reply-To: Qobi@EE.Technion.AC.IL
Organization: Technion, Israel Institute of Technology
Date: Wed, 24 Jul 1996 18:25:52 GMT
Message-ID: <QOBI.96Jul24212552@eesun.technion.ac.il>
X-Nntp-Posting-Host: eesun.technion.ac.il
Sender: news@discus.technion.ac.il (News system)
Lines: 213

A number of people have asked me to repost my news article on multiple values.
I've enclosed it below.
    Jeff (home page http://tochna.technion.ac.il/~qobi)
-------------------------------------------------------------------------------
In-reply-to: shivers@clark.lcs.mit.EDU's message of 31 Jan 1995 02:37:40 -0500
Newsgroups: comp.lang.scheme
Subject: Re: multiple-value return & optimising compilers
Reply-to: Qobi@CS.Toronto.EDU
References: <dig-Scheme-7.26@mc.lcs.mit.edu> <9501310741.AA25479@clark.lcs.mit.edu>
Distribution: world
--text follows this line--
In article <9501310741.AA25479@clark.lcs.mit.edu> shivers@clark.lcs.mit.EDU (Olin Shivers) writes:

   Matthias Blume makes the claim that multiple-value return produces objects
   that aren't first-class, in the sense that the "container" holding the
   multiple values isn't accessible as other values are.

   Multiple return values are not a data-structure; this is like saying
   variables
   aren't first class or something.

Let me stand up to Matthias' defense here. Matthias is not claiming that the
the current multiple-value spec *does* create containers. Of course it doesn't.
He is claiming that:

  a) (with the possible exception of CALL/CC) the semantics is equivalent to
     what is provided by a trivial implementation based on containers.
  b) This trivial implementation requires no new language features. 
  c) The proposed extension exists solely for the purpose of performance.

And most importantly:

  d) multiple-values look very similar to containers, so similar that people
     might view them as containers until they get bitten by the fact that they
     are really not containers.

I think that the last point is very important. It is not that they are
containers. It is that they are analogous to containers.

I think a well-designed programming language can have two features that share
a common subfeature. It can also have two features, one of which is totally
subsumed by the other, the first of which exists solely for reasons of
performance. But then the syntax for specifying which feature to use should be
orthogonal to how it is used. That way I can write my code uniformly and make
local changes to tune performance.

   Multiple return values are completely symmetric with multiple parameters to
   procedures. The latter allows you to pass multiple values to a procedure;
   the
   former allows you to pass multiple values to an implicit continuation.
   CALL-WITH-VALUES essentially allows you to specify the arity of a
   continuation.

While such symmetry might be elegant from a language design/semantics point of
view I think that multiple values is an exceedingly bad idea from the point of
view of writing clean/understandable/maintainable code. One of the hallmarks
of Lisp style is writing nested expressions. Like it or not, expressions are
trees. They potentially have multiple children yet have only a single parent.
With multiple values, programs become directed graphs. The problem with this
is that you can't extract a subexpression and talk about its meaning or place
it somewhere else.

Very often, perhaps almost always, the multiple values are semantically
related. They *belong* in a container. Let me give you an example. I once
wrote a very large piece of code that did complicated computational geometry
calculations. It was originally written for a Symbolics machine and made
extensive use of multiple values for reasons of efficiency. Points, lines,
rays, circles, and so forth were all passed around as multiple single-float
values to avoid consing. The code did many geometric constructions. These
constructions were something like the following: Given four points p1, p2, p3,
and p4, construct a ray r from p1 in the direction of p2, construct a line l
that intersects p3 and p4, and return the intersection point of r with l.
This might be done with something like the following:

(intersection (ray p1 p2) (line p3 p4)

Now suppose lines are represented as the coefficients of ax+by=c and that rays
are represented as a point <x,y> and an angle. With (Common Lisp style)
multiple values this becomes:

(multiple-value-bind (la lb lc) (line p3x p3y p4x p4y)
 (multiple-value-bind (rx ry rtheta) (ray p1x p1y p2x p2y)
  (intersection-ray-line rx ry rtheta la lb lc)))

First of all, the later is much less clear than the former. Second, the
representation of lines and rays is hardcoded. With the later approach, if I
want to change the representation of lines to <point,angle> I will have to
change every fragment of code that uses lines. With the former, the bulk of
the code remains unchanged. Third, the former allows using a generic
`intersection' function. The latter does not. Fourth, suppose that I latter
discover that my algorithm is flawed. And the line l must be rotated by the
angle theta about the point p5 before it is intersected with the ray r.
I can simply modify my code as follows:

(intersection (ray p1 p2) (rotate (line p3 p4) p5 theta))

With the latter one needs to do:

(multiple-value-bind (la lb lc) (line p3x p3y p4x p4y)
 (multiple-value-bind (la1 lb1 lc1) (rotate la lb lc p5x p5y theta)
  (multiple-value-bind (rx ry rtheta) (ray p1x p1y p2x p2y)
   (intersection-ray-line rx ry rtheta la1 lb1 lc1))))

Making the latter type of change is very error prone.

I never was able to fully debug my multiple-value code. Even after months of
debugging, there were dozens of latent bugs. I decided to forgo performance
and reimplemented the code using containers instead of multiple values. I was
able to eliminate all of the known bugs in a few days. (I ended up spending
several months writing a partial evaluator to get back the performance, but
that is a different story).

The key here is that while I was able to use multiple values and did in fact
use them, the objects that they represented WERE MORE PROPERLY VIEWED AS
*CONTAINERS*. I think that Matthias is dead right on this one.

   Encapsulating the base functionality of m-v in a procedural form (with the
   VALUES and CALL-WITH-VALUES procedures) is the way you do things in Scheme.
   It
   is analogous to encapsulating the base functionality of continuations in a
   procedural form (with the CALL/CC procedure and the reified continuation
   procedures it produces). This is a fine thing to do, and very much in the
   "spirit of Scheme" to which Matthias keeps referring.

The spirit of Scheme is as I stated (quoting from the first paragraph of R4RS:
   It was designed to have an exceptionally clear and simple semantics and
   few different ways to form expressions.
   ^^^^^^^^^^^^^^^^^^
IMHO multiple values adds an unnecessary (and even undesirable) way to express
something that is already expressible in a different way.

   I frequently see people on this list claim that it's a simple matter of
   global
   program analysis to handle all of the horrible inefficiencies introduced by
   various proposals -- such as Matthias' one parameter/one return value
   proposal.

   I have noted that people who have actually implemented aggressive
   native-code
   Scheme compilers -- such as Orbit, Gambit, or Chez Scheme -- are usually not
   the people who make these claims.

Not true. I have implemented a native-code compiler for Scheme called Stalin
that does aggressive global analysis. I believe that Stalin does more
extensive global analysis than any existing compiler for any programming
language. Stalin does not yet do the optimization necessary to convert
container-based returns into multiple-value returns but it is high on my
agenda. The infrastructure is there to support automatic linearity analysis.
(I agree with Henry Baker that linearity is important to support
`immediatization' and in-place update but prefer to have the compiler
determine this by global static analysis than have the programmer declare such
information.)

   It is very difficult to do these compilers,
   and global analysis of higher-order languages is quite tricky and does not
   always pay off.

Sometimes yes and sometimes no.

   If you believe that a magic global analysis will make the implementation
   problems go away, then I invite you to design and implement such an
   analysis.
   Not only will your results be worth serious academic acclaim, they will
   significantly impact real-world programming.

I hope so.

   It's a double win. So, please. We
   anxiously await you.

   I don't want to dampen anyone's enthusiasm. Sarcasm aside, I really do
   encourage anyone who wants to make advanced programming languages go fast --
   go for it. It's a fun problem, if you are into that kind of thing (I am).
   But
   people have been working on this one for a while. If it was easy, it would
   have been done by now.

There are many reasons why innovations happen later rather than earlier.
Sometimes it is because they are hard. Sometimes it is because some prior
enabling innovation is necessary. Sometimes it is just because no one thought
of it (perhaps because people were biased by some prevailing way of thinking).

   Unlike others on this list, I don't program in Scheme because it is a simple
   pedagogical programming language, useful for expressing ideas to students,
   or
   because it is elegant and therefore useful in applications where efficiency
   is
   of no concern at all. I want to use powerful languages to do real systems
   programming. I don't want to have to use C. Efficiency matters. For a large
   set of programming tasks -- graphics, spreadsheets, databases engines,
   operating systems, word processors, text editors, file servers, network
   applications, and so forth -- powerful notation is a great boon, but
   efficiency is a requirement, and these are the tasks to which I would like
   to
   apply the technology of advanced languages.  So arguments of the form, "Of
   course it makes the system slow, but it's minimal and elegant, so it's OK"
   don't work for me.

My primary research area is not programming languages or compilers. Stalin is
a low-priority part-time effort for me. I am building Stalin precisely because
I believe that Lisp is the best language for most of my other research efforts
and I need a better implementation than those that are avaiable.

   It's easy to appeal to hypothetical static analyses to justify introducing
   inefficient features into a language. It's a lot harder to do it.

Agreed.
	   -Olin

    Jeff (home page http://www.cdf.toronto.edu/DCS/Personal/Siskind.html)
-- 

    Jeff (home page http://tochna.technion.ac.il/~qobi)
