Newsgroups: comp.lang.dylan,comp.lang.misc,comp.lang.lisp,comp.object,comp.arch,comp.lang.c++
Path: cantaloupe.srv.cs.cmu.edu!bb3.andrew.cmu.edu!nntp.sei.cmu.edu!news.psc.edu!hudson.lm.com!news.math.psu.edu!news.cac.psu.edu!newsserver.jvnc.net!newsserver2.jvnc.net!howland.reston.ans.net!europa.chnt.gtegsc.com!news.mathworks.com!news.kei.com!world!jhallen
From: jhallen@world.std.com (Joseph H Allen)
Subject: Re: allocator and GC locality (was Re: cost of malloc)
Message-ID: <DDE1Gx.IxD@world.std.com>
Organization: The World Public Access UNIX, Brookline, MA
References: <9507261647.AA14556@aruba.apple.com> <KANZE.95Aug10145551@ <40o1dr$3ff@news. <DDBsKu.H5B@kroete2.freinet.de>
Date: Wed, 16 Aug 1995 05:19:45 GMT
Lines: 154
Xref: glinda.oz.cs.cmu.edu comp.lang.dylan:5076 comp.lang.misc:22704 comp.lang.lisp:18772 comp.object:36871 comp.arch:60461 comp.lang.c++:144122

In article <DDBsKu.H5B@kroete2.freinet.de>,
Erik Corry <erik@kroete2.freinet.de> wrote:
>Hans Boehm (boehm@parc.xerox.com) wrote:
>
>: I claim the characteristics you want from a standard string class are:
>
>: 1) It should be easy to use, with a clean interface.  It should be
>: general enough to allow reusability of libraries that rely on it.

>: 2) It should be robust.  The implementation should scale reasonably
>: to large inputs, both in that it remains correct, and that its performance
>: remains reasonable.

>3) You should be able to get a C-style null-terminated char array
>out of it at minimal cost, because you are going to need this to deal
>with just about every library written in C up until your great new
>implementation came along.

>This seems to me to be a major argument for having a flat representation
>internally, too. If you don't have simple extraction of/conversion to
>a C-style null-terminated character array, then your new string library
>is not going to be used.

You may want to look at the automatic string library in my editor joe (get
by anonymous ftp from ftp.worcester.com).  This library does nothing
interesting with memory allocation, but it satisfies '3' above nicely. 
Unfortunately you probably need indirection for any interesting GC.  Anyway
here's the man page for it.  There may be some ideas you can get for making
a more compatible string class.  There's also an automatic array of strings
library for things like environment variables and file name lists.

Name
	vsncpy, vsrm, sLEN, vstrunc, vsensure, vsins, vsdel, vsfill, vsset,
vsadd - Automatic string management functions

Syntax
	#include <vs.h>

	char *vsncpy(char *d,int off,char *s,int len);
	void vsrm(char *d);
	int sLEN(char *d);
	char *vstrunc(char *d,int len);
	char *vsensure(char *d,int len);
	char *vsins(char *d,int off,int len);
	char *vsdel(char *d,int off,int len);
	char *vsfill(char *d,int off,int c,int len);
	char *vsadd(char *d,int c);
	cgar *vsset(char *d,int off,int c);

Description
	This is a string library which supports strings which automatically
resize themselves when needed.  The strings know their own size, so getting
the length of a string is always a fast operation and storing NULs in the
string is permissable.  The strings are backward compatible with C's regular
zero terminated strings.

	Each automatic string is stored in its own malloc block and has the
following format:

	<bksize><length><string><zero>

	'bksize' and 'length' are integers which give the size of the malloc
block and the length of the string.  A zero character always follows the
string for compatibility with normal C zero-terminated strings.  The zero is
not counted as part of the string length.

	The strings are not addressed with 'bksize' (the beginning of the
malloc block).  Instead, they are addressed at the first actual character of
the string itself.  This means that an automatic string looks like a normal
C string and can be addressed with type 'char *'.  Also the array access
operator '[]' works for reading and overwriting automatic strings and
automatic strings can be passed directly to UNIX operating system functions. 
However, free() can not be used to dispose of automatic strings.  Instead,
vsrm() must be used.  Also an automatic string plus an offset is not an
automatic string, but is still a legal C language string.

Primary function
	_vsncpy_ - Copy a block of characters at address 's' of length 'len'
onto the automatic string 'd' at offset 'off'.  The automatic string is
expanded to handle any values of 'len' and 'off' which might be given.  If
'off' is greater than the length of the string, SPACEs are placed in the
gap.  If 'd' is NULL, a new string is created.  If 'len' is 0, no copying or
string expansion occurs.  _vsncpy_ returns the automatic string, which may
have been realloced or newly created in its operation.

	_vsncpy_ is the most important automatic string function.  It is
both the primary constructor of automatic strings and is also a useful
operator.  It works in close conjunction with the following macros:

	sc("Hello")	Gives --> "Hello",sizeof("Hello")-1
	sz(s)		Gives --> s,zlen(s)
	sv(d)		Gives --> d,sLEN(d)

	These macros are used to build arguments for _vsncpy_.  Many
functions can be created with combinations of sc/sz/sv and vsncpy:

	s=vsncpy(NULL,0,NULL,0);	Create an empty automatic string

	s=vsncpy(NULL,0,sc("Hello"));	Create an automatic string
					initialized with the string "Hello"

	d=vsncpy(NULL,0,sv(s));		Duplicate an automatic string

	d=vsncpy(NULL,0,sz(s));		Convert a C string into an automatic
					string

	d=vsncpy(sv(d),sv(s));		Append automatic string s onto d

	d=vsncpy(sv(d),sc(".c"));	Append a ".c" extension to d.

	d=vsncpy(d,0,sc("Hello"));	Copy "Hello" to the beginning of d. 
					The original length of d is
					unchanged, unless it had to be
					expanded to fit "Hello".

Other functions

	_vsrm_ is used to free an automatic string.  If NULL is passed to
it, nothing happens.

	_sLEN_ returns the length of an automatic string.  If the string is
NULL, sLEN returns 0.

	_vstrunc_ sets the length of an automatic string.  The string is
created if NULL is passed to it.  The string will be padded with spaces if
its length is increased.  Vstrunc may reallocate the string if (and only if)
it is expanded, so the return value must not be ignored.

	_vsensure_ reallocs the malloc block of the given string so that the
string can be later expanded to the specified length without any calls to
realloc.

	_vsins_ inserts a gap into a string.  If the string is NULL it is
created.  If the specified offset is past the end of the string, the string
is extended.

	_vsdel_ deletes a section of the string.  It does nothing if the
specified offset is past the end of the string.

	_vsfill_ fills a portion of a string to the specified character.

	_vsadd_ appends a single character to the end of the string.  A new
string is created if the specified string was NULL.  This function is very
useful for loops which create strings by appending some source of
characters.

	_vsset_ sets a character at a specified offset.  A new string is
created if the specified string was NULL.  The string is filled with SPACEs
if the specified offset is past the end of the string.
-- 
/*  jhallen@world.std.com (192.74.137.5) */               /* Joseph H. Allen */
int a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--;)q=3&(r=time(0)
+r*57)/7,q=q?q-1?q-2?1-p%79?-1:0:p%79-77?1:0:p<1659?79:0:p>158?-79:0,q?!a[p+q*2
]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817;)printf(q%79?"%c":"%c\n"," #"[!a[q-1]]);}
