Newsgroups: comp.lang.scheme
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!news.mathworks.com!udel!princeton!news.princeton.edu!blume
From: blume@dynamic.cs.princeton.edu (Matthias Blume)
Subject: Re: why no substring sharing?
In-Reply-To: schwartz@roke.cse.psu.edu's message of 27 Nov 1994 02:18:02 GMT
Message-ID: <BLUME.94Nov26222547@dynamic.cs.princeton.edu>
Originator: news@hedgehog.Princeton.EDU
Sender: news@Princeton.EDU (USENET News System)
Nntp-Posting-Host: dynamic.cs.princeton.edu
Organization: Princeton University
References: snark@bark.COM (Impatient Observer) <9411262226.AA13273@bitsy.MIT.EDU>
	<SCHWARTZ.94Nov26211802@roke.cse.psu.edu>
Date: Sun, 27 Nov 1994 03:25:47 GMT
Lines: 70


Folks, please!

There is certainly no need to insult each other over simple issues like
spelling!

I was criticizes in this forum for making an inappropriate comment
(we could also drill a hole...) on someone's suggestions.  Even
though this comment was meant to be funny (it was an attempted
translation of a line taken from a Austrian movie -- Mller's Bro
if anyone cares to know) I retracted the offending article anyway.

In article <SCHWARTZ.94Nov26211802@roke.cse.psu.edu> schwartz@roke.cse.psu.edu (Scott Schwartz) writes:

   snark@bark.COM (Impatient Observer) writes:
      > Why does substring return a newly allocated string?

      Because strings in Scheme, like vectors, have an object header which
      contains their type code and length. To make every substring of a string
      carry this kind of information would be wasteful.

   I don't understand.  Currently the only way to represent a substring
   is to cons a whole new string, which is very expensive.

Both sides are right and both sides are wrong -- it only depends on
how you view it.  Read on...

   Since you
   propose a (pointer,length) representation of strings, consider that if
   substring returned a pointer into the middle of some other string's
   existing storage and some new length, then that incurs no additional
   cost but does save the expense of allocating and copying.

You can always build your own string primitives on top of the
existing ones.  You can represent a string as a triple (e.g. a
3-element list): (<character array> <start index> <end index>) to
build your storage-sharing strings out of non-storage-sharing ones.
(<character array> would be implemented as an ordinary string.)  And
obviously, if you have string-copy you can always create the
non-sharing variety out of the sharing breed.

There are a few reasons why you shouldn't even care which version is
used by your system -- the only moment you want to know which is which
is when you do string-set!.  This is imperative programming style,
which IMO whould be mimimized in Scheme.

Don't worry about low-level efficiency!  Write your program -- get it
right first -- then get it fast!  Use profiling to prove that the
string ops are the bottleneck before you are going into heroics!
(Ooops -- do we have any decent profiling tool for Scheme?)

Sharing the storage also causes problems for the garbage collector
(although they aren't unsolvable), and sharing in general creates a
lot of unnecessary dependencies between programs parts, which
shouldn't be dependend on each other.

   If any implementors feel insulted then I deeply apologise, but I would
   still prefer that substrings be shared.

This is *your* personal preference.  Obviously, other people's opinions
on this subject differ.  And if you need shared strings -- I told you
how to build them.

   Your subsequent attempt at charicaturing my position misses the point
   as well.

I fully agree with you here.

--
-Matthias
