Newsgroups: comp.lang.scheme
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!howland.reston.ans.net!pipex!uunet!news.cygnus.com!nntp!lord
From: lord@x1.cygnus.com (Tom Lord)
Subject: Re: why no substring sharing?
In-Reply-To: smcl@sytex.com's message of Tue, 29 Nov 1994 04:16:32 GMT
Message-ID: <LORD.94Nov30183205@x1.cygnus.com>
Sender: news@cygnus.com
Nntp-Posting-Host: x1.cygnus.com
Organization: Cygnus Support
References: <ROCKWELL.94Nov28162254@nova.umd.edu> <02aJwc1w165w@sytex.com>
Date: Thu, 1 Dec 1994 02:32:05 GMT
Lines: 110



In article <02aJwc1w165w@sytex.com> smcl@sytex.com (Scott McLoughlin) writes:

   rockwell@nova.umd.edu (Raul Deluth Miller) writes:

   > Tom Lord:
   > :  (UNSHARE-SUBSTRING! <string>) => <unspecified>
   > :	   Modify the argument to no longer share state with
   > :	   any other string in the system.
   > 
   > This seems somewhat ambiguous.  (e.g. is it a variable that is being
   > modified?  a location?  What does it mean for this argument to be not
   > shared?)
   > 
   > If such a thing were implemented, presumably it would be not be
   > designed as a mutator but as a pure function.
   > 
   > -- 

	   A shared substring would presumably be implemented as a 
   reference to a simple string (or whatever char array representation
   one wishes to use) and start/end indexes into the representation.
   The UNSHARE-SUBSTRING! primitive need not make a simple-string at
   all. It would only need to make sure that the representation 
   component was not referenced by any other live object, i.e. copy
   the underlying representation and alter the start/end indexes
   accordingly.

UNSHARE-SUBSTRING! guarantees that subsequent modifications to its
argument won't modify any other strings until SHARED-SUBSTRING is
called on the same argument.  In combination with the predicate:
SHARED-SUBSTRING?, the mutator UNSHARE-SUBSTRING! makes it easy to
implement copy-on-write semantics for strings.



	   This makes me wonder what the SHARED-SUBSTRING? or whatever
   predicate would now return on the altered object. Is this a predicate
   about the representation method used for the string (Simple-String
   vs. a displaced representation) or a predicate that tells us
   whether the string is indeed shared or not. The latter would require
   cooperation from the GC/reference counts/etc which sounds expensive.

Here is on operational model that doesn't use reference counts or
special GC modifications.  This model defines the type FANCY-STRING
which has the operations SHARED-SUBSTRING etc.  The model is built
using a hypothetical structure package and standard Scheme strings:


	;; The FANCY-STRING type is a model for strings with 
	;; the capability to share substrings.
	;;
	(define-structure fancy-string (basic-string start end shared?))

	(define (capture-string-in-fancy-string s)
		(fancy-string s 0 (length s) #f))

	(define (string->fancy-string s)
		(capture-string-in-fancy-string (string-append s)))

	;; Create a SHARED-SUBSTRING (this version lacks range checks).
	;;
	(define (shared-substring fs sub-start sub-end)
		(fancy-string.shared?-set! fs #t)
		(fancy-string (fancy-string.basic-string fs)
			      (+ sub-start (fancy-string.start fs))
			      (+ sub-end (fancy-string.end fs))
			      #t))

	(define (shared-substring? fs)
		(fancy-string.shared? fs))

	(define (unshare-substring! fs)
		(fancy-string.basic-string-set!
		 fs
		 (string-append (fancy-string.basic-string fs))))

Now all that you need are new versions of the string functions that
operate on FANCY-STRING instead of ordinary strings.

Notice that this implementation is just a model and doesn't satisfy
the spec i made.  In particular, the type FANCY-STRING is disjoint
from ordinary Scheme strings, and the shared substring operations are
available only for FANCY-STRINGs.

Can you get fast versions of the string primitives for FANCY-STRING if
you write them in Scheme?  One way is if your Scheme compiler can
generate good code for simple implementations.  I know that hobbit can
do so.  From what i read, i gather that a compiler with soft typing
could also generate good code.  But in most cases, you'll get lousy
code if you try to write the new functions in Scheme.  That is why i
suggested making shared substrings a built-in feature of the
implementation (which is what i intend to do for Guile).

(To be clear, performance is the motivation for this feature.  Some
 people suggested a more general form of substrings in which, instead
 of START and END positions, the constructor took a THUNK that mapped
 indexes in the substring to indexes in the source string.  This
 general form would permit not only substrings, but arbitrary `permutation
 strings'.  That generality might (or might not) be worthwhile, but
 it is certainly going to be slow.  Substrings as i've defined them
 are hardly slower at all than normal Scheme strings.)

-t
--
----

If you would like to volunteer to help with the GNU extension language
project, please write to lord@gnu.ai.mit.edu.
