Newsgroups: comp.lang.dylan,comp.lang.misc,comp.lang.lisp,comp.object,comp.arch,comp.lang.c++
Path: cantaloupe.srv.cs.cmu.edu!bb3.andrew.cmu.edu!nntp.sei.cmu.edu!news.psc.edu!hudson.lm.com!news.math.psu.edu!news.cac.psu.edu!newsserver.jvnc.net!newsserver2.jvnc.net!howland.reston.ans.net!xlink.net!slsv6bt!news
From: kanze@lts.sel.alcatel.de (James Kanze US/ESC 60/3/141 #40763)
Subject: Re: allocator and GC locality (was Re: cost of malloc)
In-Reply-To: boehm@parc.xerox.com's message of 14 Aug 1995 17:33:47 GMT
Message-ID: <KANZE.95Aug16203305@slsvhdt.lts.sel.alcatel.de>
Lines: 99
Sender: news@lts.sel.alcatel.de
Organization: SEL
References: <9507261647.AA14556@aruba.apple.com> <3v8g7l$cge@jive.cs.utexas.edu>
	<3vac07$ptf@info.epfl.ch> <3vb382$dtr@jive.cs.utexas.edu>
	<3vbl70$bht@fido.asd.sgi.com> <hbaker-3107951026250001@192.0.2.1>
	<justin-0108951458440001@158.234.26.212> <hbake
	<jyuynr@bmtech.demon.co.uk> <hbaker-0208950816000001@192.0.2.1>
	<jyvgwh@bmtech.demon.co.uk> <hbaker-0408950815320001@192.0.2.1>
	<405k8h$emi@news.parc.xerox.com> <hbaker-0708951241390001@192.0.2.1>
	<40apft$3im@news.parc.xerox.com> <KANZE.95Aug101
	<40o1dr$3ff@news.parc.xerox.com>
Date: 16 Aug 1995 18:33:05 GMT
Xref: glinda.oz.cs.cmu.edu comp.lang.dylan:5078 comp.lang.misc:22713 comp.lang.lisp:18788 comp.object:36900 comp.arch:60469 comp.lang.c++:144261

In article <40o1dr$3ff@news.parc.xerox.com> boehm@parc.xerox.com (Hans
Boehm) writes:

|> kanze@lts.sel.alcatel.de (James Kanze US/ESC 60/3/141 #40763) writes:

|> ...

|> >With this in mind, I'm not really convinced that the solution is to
|> >try and create an optimal string class as a standard.  I rather think
|> >of the standard string class as a facility for people like myself,
|> >whose programs only do string handling secondarily (formatting error
|> >messages, and the like).  If I were writing an editor, for example, I
|> >would not expect the standard string class to be acceptable for my
|> >text buffers.  In this regard, just about any string class with the
|> >required functionality will do the trick.  (And it is more important
|> >that the string class be easier to use than that it be fast.)

|> >This does mean that most text oriented applications will have to
|> >`re-invent the wheel', in that they will have to write their own
|> >string class.  But I'm not convinced that there is a string class
|> >which would be appropriate for all applications, anyway.

|> I agree that some applications will need their own string class.
|> Nontheless, I think a standard string class is important.  Many libraries
|> will require strings as input parameters, or generate strings.  If you
|> want any hope of having such libraries play together, they will need
|> a standard string representation.

I agree that a standard string representation is important.  There are
a lot of programs (including most of my own) where text has simply a
support function; for these, it would be ridiculous to have to create
a special class to support it.

|> I claim the characteristics you want from a standard string class are:

|> 1) It should be easy to use, with a clean interface.  It should be
|> general enough to allow reusability of libraries that rely on it.

I agree.  This should be the single most important feature of a
general string class.

|> 2) It should be robust.  The implementation should scale reasonably
|> to large inputs, both in that it remains correct, and that its performance
|> remains reasonable.

I'm not totally convinced that support for very long strings is
necessary.  I do agree that it should be robust in the sense that 1)
it is itself free of errors, and 2) it detects and handles user errors
gracefully (for some definition of gracefully).

I also think that, in the context of C++, there is a third criteria
which is very important:

3) it must ``work'' with the old, '\0' terminated C style strings.
After all, this is what, for example, most window systems will expect
for a considerable time.

|> I would argue that a string class based on a conventional flat representation,
|> and whose interface exposes that, doesn't do very well on either criterion.

I agree.  At the very least, it fails the first criteria.  But there
is a fundamental conflict between this and the third criteria (which
obviously requires a means of getting a flat representation).

This said, I think that the current string class does allow internal
representations other than a flat one, on the condition of converting
itself to the flat representation whenever c_str is called.

|> It's often difficult or impossible to pass some other string representation
|> (e.g. a file) to a string function without actually changing representation
|> (e.g. reading the whole file).  A tree-based representation can, at minimal
|> cost, accomodate strings with user specified character access functions.

I don't see any possibility of passing a file to a ``standard'' string
class without reading it anyway.  Remember, you do not have access to
the internals of a standard string class to let it know about
alternative representations, like a file.  So if it isn't provided for
in the standard, you don't have it.

|> A flat representation does allow in-place string updates,
|> a facility I would personally rather leave out of a "standard" string class.
|> You still have arrays of characters when you need them.

I agree here.  My own string class did not support inplace updates,
and I've never really seen the need for them.  But a ``standard''
string class must represent a consensus, and most people seem to want
inplace updates.  (The only non-const function in my string class was
operator=.)

   [I've cut the text concerning scaling.  As I said earlier, I'm not
convinced that the general string class has to support long strings
that efficiently...]
-- 
James Kanze         Tel.: (+33) 88 14 49 00        email: kanze@gabi-soft.fr
GABI Software, Sarl., 8 rue des Francs-Bourgeois, F-67000 Strasbourg, France
Conseils en informatique industrielle --
                              -- Beratung in industrieller Datenverarbeitung


