Newsgroups: comp.lang.dylan,comp.lang.misc,comp.lang.lisp,comp.object,comp.arch
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!oitnews.harvard.edu!yale!zip.eecs.umich.edu!panix!news.mathworks.com!gatech!news.uoregon.edu!news.bc.net!info.ucla.edu!csulb.edu!csus.edu!netcom.com!NewsWatcher!user
From: hbaker@netcom.com (Henry Baker)
Subject: Re: allocator and GC locality (was Re: cost of malloc)
Message-ID: <hbaker-3107951007190001@192.0.2.1>
Sender: hbaker@netcom15.netcom.com
Organization: nil
References: <9507261647.AA14556@aruba.apple.com> <3v8g7l$cge@jive.cs.utexas.edu> <3vac07$ptf@info.epfl.ch> <3vb382$dtr@jive.cs.utexas.edu>
Date: Mon, 31 Jul 1995 18:07:19 GMT
Lines: 137
Xref: glinda.oz.cs.cmu.edu comp.lang.dylan:4928 comp.lang.misc:22405 comp.lang.lisp:18490 comp.object:36066 comp.arch:59993

In article <3vb382$dtr@jive.cs.utexas.edu>, wilson@cs.utexas.edu (Paul
Wilson) wrote:

> Then I should restate a couple of things:  I'm very much pro-GC, for software
> engineering reasons, but I think that it ultimately costs you a little
> bit in locality, relative to a similarly good explicit deallocator
> (malloc/free kind of thing).

I think that you'd have to be very specific about what your assumptions
were in order to back this conclusion up.  You'd have to include things
like how smart the compiler was about killing dead things promptly,
and what language you are writing in, and what help the language/type
system gives to the programmer to do 'explicit' deallocation.

For a large fraction of 'vanilla' C/C++ code, however, I would strongly
disagree.  Most people's attempts at local 'optimizations' by doing
explicit deallocation will backfire, and lead to _both_ insecure code
_and_ lower performance.  In most professionally constructed GC systems,
the GC architect has thought a _lot more_ about locality and performance
issues than most application writers, and is also in a much better position
to do something positive about locality than most application writers.

> Unfortunately, comparisons are quite difficult because the state of
> allocator research and development is a bit of a mess.  We don't know
> much more than we did in 1971, which wasn't a whole lot.

Agreed.

> >Paul Wilson <wilson@cs.utexas.edu> wrote:
> >] Ken Dickey <KenD@apple.com> wrote:
> >] >At 11:38 AM 95/07/26 -0400, Scott McKay wrote:
> >] >>Another thing to note is that compacting GC's usually dramatically
> >] >>reduce ther overall working set of programs, which in turn reduces the
> >] >>number of page faults a program might take.  ...
> >
> >Is there hard data on this one ?
> 
> Essentially none.  Compaction is overrated, by the way.  It can do as much
> damage as good, if it's not done *right*.  As far as comparisons between
> conventional and GC'd systems go, there is essentially zero data on locality
> effects, and there's every reason to think that locality effects are 
> important.

With a high quality VM system and log-structured file systems, I'm afraid
you are right.  In these cases, the underlying VM has already gained most
of the advantage that compaction would offer.  One could gain a bit more
memory utilization with compaction, but the cost of actually doing the
compaction may not pay for itself.  With modern caches, I'm not sure if
there is any gain at all on that front.

IMHO the major gain from compaction is to defragment the address space,
but as Paul's recent work has shown, one may not have to do this very
often.

> For more info on this, see the paper allocsrv.ps in our repository of papers
> on memory management.  (ftp.cs.utexas.edu, in the directory pub/garbage.)

Excellent paper, and highly recommended.

> GC's (except for reference
> counting GC's) tend to do something that's intrinsically nasty for locality;
> they pollute the cache with short-lived objects and can't *reuse* that memory
> until a garbage collection detects that the memory is reusable.  You can limit
> the damage this causes, but you can't eliminate it entirely.

This is correct as far as the _address space_ is concerned, but completely
wrong as far as cache behavior is concerned.  Reinhold's wonderful thesis

Reinhold, Mark B.  "Cache Performance of Garbage-Collected Programming
Languages".  MIT/LCS/TR-581, Sept. 1993.  (Also PLDI'94??)

shows how a gc makes _excellent_ use out of a write-allocate cache, and
that short-lived objects cause no problems at all, because _they live
and die in the cache and need never cause any memory traffic at all_.

So the problem is bad HW architectures, not GC, per se.  (When, oh when,
will VM architects give us 'write-allocate' VM?  CS people have known
that it is important for 30 years, but somehow the OS people have never
gotten the word.)

> Yes.  On the other hand, I think one of the things that has limited progress
> is a failure to focus on the fundamentals---namely program behavior and
> what an allocator (or GC) can do to map that behavior nicely onto memory.
> Architects tend to have a very naive view of locality, which is that it's
> like a natural phenomenon that either is or isn't there.  In fact, it's a
> product of patterned program behavior interacting with patterned allocator
> (or GC) strategies, and these must be studied in relative isolation to
> figure out what's really going on.  I believe that there are common and
> simple patterns in program behavior that have not been noticed or exploited,
> and that this is why there's been very little progress in the study of
> allocators and the study of locality.

Excellent point.  Unfortunately, HW architects don't have a lot of
control over what SW people run on their machines.  It would be a
shame if HW architects go to a lot of trouble, however, to give people
something that can be much more effectively accomplished with better
SW.

> >This is the "understanding" problem: if the cace can notice that reading the
> >line is not necessary when allocating a new object, it means the cache
> >understands better the way the GC-allocator works. Your point about the
> >memory bandwidth of the write-backs could be argued against in saying that
> >there should be a way for the cache to notice that most of the write-backs
> >aren't necessary (I'm assuming here that the generations' sizes have been
> >carefuly chosen so that they "fit the memory hierarchy" (whatever this
> >really means in terms of relative sizes): in such a case most write-backs
> >will be for "old-space" objects that have been recognized as dead already).
> >Since there will still be write-backs of objects that *are* dead but haven't
> >been recognized as such yet, I guess manual allocators still have an edge.

Manual allocators have exactly the same trouble that GC's do if they
have no way to tell the HW that a particular cache line is now dead.
It's way past time for HW to have some ability for the SW to inform it
that things are dead.  I have argued elsewhere for things like a
"last load" instruction, which informs that HW that after supplying
the contents of this memory location, it is now dead, and need not
be written back.  In conjunction with a write-allocate cache, much
memory traffic can be eliminated by the compiler.

> I don't think so.  For small caches, by the time you've compacted data it's
> too late---you touched too much memory in between GC's, because you couldn't
> reuse memory immediately.

Wrong intuition.  Worry more about cache behavior.  See Reinhold's thesis again.

> >For lack of hard data, this is probably blowing hot air, tho !
> >But the point still holds: it'd probably be a good idea to provide ways to
> >tell the memory hierarchy what's going on (like a "trash" instruction to
> >indicate that some memory region can be trashed (to free the old-space) or a
> >destructive read (can be used for linear-programming, or for stack-pops)).
                                     ^^^^^^^^^^^^^^^^^^

I think you mean 'linear logic style of programming' here!

-- 
www/ftp directory:
ftp://ftp.netcom.com/pub/hb/hbaker/home.html
