Newsgroups: alt.lang.design,comp.lang.c,comp.lang.c++,comp.lang.lisp,zer.z-netz.sprachen.algorithmen
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!fas-news.harvard.edu!newspump.wustl.edu!darwin.sura.net!howland.reston.ans.net!pipex!dircon!rheged!simon
From: simon@rheged.dircon.co.uk (Simon Brooke)
Subject: Reference Counting (was Re: Searching Method for Incremental Garbage Collection)
Message-ID: <CzHCvp.9rM@rheged.dircon.co.uk>
Followup-To: comp.lang.lisp
Keywords: storage garbage collection incremental search method
Organization: none. Disorganization: total.
References: <3ai2ol$3ua@gate.fzi.de>
Date: Fri, 18 Nov 1994 20:28:35 GMT
Lines: 101
Xref: glinda.oz.cs.cmu.edu comp.lang.c:117171 comp.lang.c++:99513 comp.lang.lisp:15700

In article <3ai2ol$3ua@gate.fzi.de>, Ulrike Koelsch <koelsch@fzi.de> wrote:
>Hello everybody,
>
>I am searching for a method of incremental garbage collection working on the 
>storage of an object-oriented database system.
>
>Idea:
>
>I would like to use a garbage collection that works only in incremental short
>time slots or which may be interrupted by the application at every time.
>
>Problem:
>
>How can I avoid getting inconsistencies by applications setting and deleting 
>references on a part of the memory my garbage collector examines in
>some interrupted runs.
>

I'm following up to the net, rather than mailing as Ulrike suggests,
because I know people who understand these things better than I will
probably disagree with what I have to say, and in the hope that by
following their responses (if any) I can understand why I am wrong.

I've always believed that a reference counting garbage collector had a
lot going for it, particularly in systems which require continuous
performance. This application seems to me a case in point. In a
reference counting system, a garbage collector _as such_ doesn't exist
at all; simply, a function is called whenever a pointer is released.
This decrements the reference counter on the referenced object, and if
it finds that the counter has reached zero, calls itself recursively
on all the objects referenced by the object, before returning the
object to the free memory pool (which may be organised as a cons-space
or a heap or bbop or however).

Obviously there are some problems with this approach. If a fixed width
reference counter is used, then there will be some value at which the
reference counter would 'roll over' to zero if incremented.
Consequently, at that value the reference count must 'lock' (i.e. not
get further incremented *or* *ever* decremented), and the object
becomes locked into memory. The reference counter also uses store on
each object, thereby making the object overhead greater.

Reference Counting and Memory Fragmentation: a solution?
~~~~~~~~~ ~~~~~~~~ ~~~ ~~~~~~ ~~~~~~~~~~~~~~ ~ ~~~~~~~~~
People who are much better computer scientists than me also say that
such a system will suffer badly from memory fragmentation, which on
paged or virtual memory systems is likely to be a big performance hit.
Although I respect these people, I've yet to be wholly satisfied with
this argument. I have played with the design of a system which held a
series of pages each of which was organised as an array of cells.
Objects which would not fit in a single cell were held separately in a
heap (HSOs or Heap Space Objects), but each was referenced through a
single, single celled object (an HSR). The HSR held the reference
counter for the HSO. Cells on each page were allocated from and
returned to a free list, and the page held a count of the number of
allocated cells.

A central lookup table indirected references to the pages, so that a
pointer would be <page no><offset>, and also held centrally was a
pointer to the 'current page'. New cells would always and only be
allocated from the current page. When the 'current page' became full,
the least full other page would become the new 'current page'. If all
existing pages were full, a new page would be requested from the
heap (if a page ever became empty it would of course be returned to
the heap).

It seems to me that the architecture described above would minimise
memory fragmentation in just the same way that generational garbage
collecting does -- cells of the same age will tend to be in the same
place, so that recursing down a list will tend to take place within a
single page; and most allocated pages will tend to be pretty much full
most of the time. Do these suppositions make sense to people?

I appreciate that a reference counting scheme will tend to be a little
more computationally intensive than a mark-and-sweep, and consequently
systems using it will over all tend to run a little slower. But the
trade off is that it will run consistently, without the little
hiccoughs which are typical of even generational GCs and quite marked
on full mark-and-sweeps.

Particularly in the sort of application Ulrike describes, where a
module which has automatic store management has to coexist and
interwork with other modules which don't understand automatic store
management, it seems to me that reference counting has two very
significant benefits: you can pass a pointer to one of 'your' objects
to the foreign module, secure in the knowledge that your object won't
move in memory; and you can be assured that the foreign module won't
get tangled up in your garbage collector, because unlike the town of
Lisp in Belgium (yes, it is a real place) your system won't have one.

But I am 'only an egg': perhaps the old ones of the net will help me
to grok the fullness.

Cheers

Simon
-- 
------- simon@rheged.dircon.co.uk

	Is this the only house in the world with hot and cold running 
	usenet, but no running water?
