Newsgroups: comp.lang.dylan,comp.lang.lisp,comp.lang.java
Path: cantaloupe.srv.cs.cmu.edu!europa.chnt.gtegsc.com!usenet.eel.ufl.edu!spool.mu.edu!howland.reston.ans.net!ix.netcom.com!netcom.com!NewsWatcher!user
From: hbaker@netcom.com (Henry Baker)
Subject: Re: Garbage collection cost (was Re: Parenthesized syntax challenge)
Message-ID: <hbaker-1710952127200001@10.0.2.15>
Sender: hbaker@netcom10.netcom.com
Organization: nil organization
References: <44aa9a$j5h@miso.cs.uq.edu.au> <LUDEMANN.95Oct6140930@expernet26.expernet.com> <DGApp8.J41@undergrad.math.uwaterloo.ca> <MAD.95Oct13123618@tanzanite.math.keio.ac.jp> <45ksdk$7gr@jive.cs.utexas.edu> <DGJp8o.7nF@Cadence.COM> <MAD.95Oct18040436@tanzanite.math.keio.ac.jp>
Date: Wed, 18 Oct 1995 05:27:20 GMT
Lines: 55
Xref: glinda.oz.cs.cmu.edu comp.lang.dylan:5449 comp.lang.lisp:19565 comp.lang.java:1934

In article <MAD.95Oct18040436@tanzanite.math.keio.ac.jp>,
mad@math.keio.ac.jp wrote:

> All the references shown by Paul Wilson say that things are not that
> simple.  Hans Boem's web page ftp://parcftp.xerox.com/pub/gc/complexity.html
> was especially clear and convincing.  Now I understand why "copying
> collector is faster than mark and sweep" is a myth.

A number of us asked Hans Boehm to write down his argument, and his web
page admits that the argument still needs some work.  I basically agree
with most of his conclusions, but would like to point out some additional
things:

* although a copying gc uses ~2X the _virtual_ memory, it isn't clear that
it needs to use 2X the _real_ memory.  However, in order to save paging,
it is essential that the VM allows the user to give it information about
what is live and what is not.  For example, when starting to allocate
in the 'tospace', there is no need to actually _read_ the page, only to
write it.  So the VM needs the ability to _write-allocate_, like some
caches.  Similarly, when the 'fromspace' is abandoned, the VM should
literally discard the pages and make no attempt to write them out, since
nothing on them will ever be read again.  (This may be impossible on
high-security VM's which would require that the pages on disk be literally
zeroed out; this is something that should be performed by today's
intelligent disk controllers (anyone out there listening??)).
Finally, if you copy stuff from one place to another more than one
generation -- and if you copy in the same order (breadth OR depth first) --
then you will be touching sequentially exactly the same relative pages in both
the fromspace and the tospace at the same time, plus some additional pages
whose forwarding pointers need to be updated.  A well-designed VM should
be able to take advantage of this kind of correlation to improve paging
(and possibly cache) behavior.  You also want to make sure that your fromspace
and your tospace are offset in a direct-mapped cache, or you will pay
dearly!!

* I disagree with Hans that people will want to keep their mark(copy)/alloc
ratios high enough to keep the total memory to some some integer times the
live data.  Memory is becoming cheaper faster than processors are becoming
faster, so it will continue to be more appealing to allocate enough memory
to keep the fraction of the time spent in GC at a nearly negligible level
-- kind of like the current cost of 'refreshing DRAMs' (you didn't know
about this?)  As a result, the 'sweep cost' _could_ (without clever
programming) become a significant fraction of GC time, although it would
still be a negligible cost overall.

* I disagree with both Paul and Hans to a certain extent regarding the
necessity to eliminate fragmentation.  Although I don't like copying
large objects around, it is precisely the largest objects which can cause
the most fragmentation, so moving them becomes inevitable in a long-lived
system such as a persistent object store.  Perhaps a GC which copies only
rarely, rather than one which copies all the time, is a better compromise.

-- 
www/ftp directory:
ftp://ftp.netcom.com/pub/hb/hbaker/home.html
