                      The Collection-Extensions library
                      =================================

  Copyright (c) 1994  Carnegie Mellon University
  All rights reserved.

  The various modules in this library contain a few new types and operations
  which are compatible with the collection types specified in the Dylan
  Interim Reference Manual, but which are not part of that specification.  

  It is to be expected that more such collection types will appear in time,
  and they will likely be added to this library.  This may also result in
  reorganizations which could force incompatible changes to the existing
  modules.  We hope to minimize such imcompatibilities and, when forced to
  them, will include sufficient information to facilitate conversion of
  existing code.  

  The particular modules provided at present are:  

  heap 
      Provides <heap>, a popular data structure for priority queues.  The
      semantics are basically those of a sorted sequence, with particularly
      efficient implementations of add!, first, and pop (i.e.
      "remove-first").  
  so-list 
      Provides "self-organizing lists".  These explicit key collections
      provide roughly the semantics of hash tables, but use a probabilistic
      implementation which provides O(n) worst case performance but can
      provide very fast constant time access in the best case.  
  subseq 
      Provides "subsequences", which represent an aliased reference to some
      part of an existing sequence.  These are analogous to slices (in Ada or
      Perl) or displaced arrays (in Common Lisp).  Subsequences are
      themselves subclasses of <sequence>, and can therefore be passed any
      <collection> or <sequence> operation.  
  string-search 
      Provides a small assortment of specialized operations for searching and
      modifying <vector>s and <byte-string>s.  These operations are analogous
      to existing collection operations but provide keywords and efficiency
      improvements which are meaningful only within the more limited domain. 

Heap
----
  A heap is an implementation of the abstract data type "sorted list". A heap
  is a sorted sequence of items.  Most likely the user will end up writing
  something like 

  define class <heap-item> (<object>);
    slot priority;
    slot data;
  end class <heap-item>;

  with appropriate methods defined for < and =. The user could, however, have
  simply a sorted list of integers, or have some item where the priority is
  an integral part of the item itself.  

  make on heaps supports the less-than: keyword, which supply the heap's
  comparison and defaults to <.  

  Heaps support all the usual sequence operations. The most useful ones:  

       push(heap, item) => updated-heap
       pop(heap)        => smallest-element
       first(heap)      => smallest-element
       second(heap)     => second-smallest-element
       add!(heap, item) => updated-heap
       sort, sort!      => sorted-sequence

  These are all "efficient" operations (defined below).  As with <deque>,
  push is another name for add!, and does exactly the same thing except that
  push doesn't accept any keywords.  sort and sort! return a sequence that's
  not a heap. Not necessarily efficient but useful anyhow:  

       add-new!(heap, item, #key test:, efficient:) => updated-heap
       remove!(heap, item, #key test:, efficient:) => updated-heap
       member?(heap, item, #key test:, efficient:) => <boolean>

  The efficient: keyword defaults to #f. If #t, it uses the
  random-iteration-protocol (which is considerably more efficient, but isn't
  really standard behavior, so it had to be optional).  Conceivably most
  sequence methods could support such a keyword, but they don't yet.  

  The user can use element-setter or the iteration protocol to change an item
  in the heap, but changing the priority of an item is an error and Bad
  Things(tm) will happen. No error will be signaled.  Both of these
  operations are very inefficient.  

  Heaps are NOT <stretchy-collection>s, although add! and pop can magically
  change the size of the heap.  

  Efficiency: Approximate running times of different operations are given
  below: (N is the size of the heap) 

      first, first-setter                             O(1)
      second (but not second-setter)                  O(1)
      size                                            O(1)
      add!                                            O(lg N)
      push                                            O(lg N)
      pop(heap)                                       O(lg N)
      sort, sort!                                     O(N * lg N)
      forward-iteration-protocol          
                              setup:                  O(N)
                              next-state:             O(lg N)
                              current-element:        O(1)
                              current-element-setter: O(N)
      backwards-iteration-protocol
                              setup:                  O(N * lg N)
                              next-state:             O(1)
                              current-element:        O(1)
                              current-element-setter: O(N)
      random-iteration-protocol           
                              setup:                  O(1)
                              next-state:             O(1)
                              current-element:        O(1)
                              current-element-setter: O(1)
      element(heap, M)                                O(M*lg N + N)
      element-setter(value, heap, M)                  O(N + M*lg N + M)

  element, element-setter on arbitrary keys use the
  forward-iteration-protocol (via the inherited methods), and have
  accordingly bad performance.  

So-List
-------
  The "Self Organizing List" is a "poor man's hash table".  More precisely,
  <so-list> is a subclass of <mutable-explicit-key-collection> for which
  addition and retrieval are both linear in the worst case, but which use a
  probabilistic strategy which yields nearly constant time in the best case.  

  Because they have a very low overhead, self-organizing lists may provide
  better peformance than hash tables in cases where references have a high
  degree of temporal locality.  They may also be useful in situations where
  it is difficult to create a proper hash function.  

  Define new so-lists with 

    make(<so-list>, test: test)

  Test is expected to be an equality function.  In particular, it is expected
  to satisfy the identity and transitivity requirements described in chapter
  5.  If not specified, test defaults to \==.  

Subseq
------
  <Subsequence> is a new subclass of <sequence>.  A subsequence represents an
  aliased reference to some part of an existing sequence.  Although they may
  be created by make (with required keywords source:, start: and end:) on one
  of the instantiable subclasses, they are more often created by calls of the
  form 

    subsequence(sequence, start: 0, end: 3)

  where start: and end: are optional keywords which default to the beginning
  and end, respectively, of the source sequence.  No other new operations are
  defined for subsequences, since all necessary operations are inherited from
  <sequence>.  

  Because subsequences are aliased references into other sequences, several
  properties must be remembered:  

   1. The contents of a subsequence are undefined after any destructive
      operation upon the source sequence.  
   2. Destructive operations upon subsequences may be reflected in the
      source.  The results of reverse! and sort! should be expected to affect
      the source sequence for vector subsequences.  

  If the source sequences are instances of <vector> or <string>, then the
  implementation will use subclasses of <subsequence> which are also
  subclasses of <vector> or <string>.  

  Efficiency notes:  

   1. The implementation tries to insure that subsequences of subsequences
      can be accessed as efficiently as the original subsequence.  (For
      example, the result of 

        subsequence(subsequence(source, start: 1), start: 2)

      would produce a subsequence identical to the one produced by 

        subsequence(source, start: 3)

   2. Vector subsequences, like all other vectors, implement constant time
      element access.  
   3. Non-vector subsequences may take non-constant time to create, but will
      provide constant-time access to the first element.  This should produce
      the best performance provided some element of the subsequence is
      accessed at least once.  

String-Search
-------------
  The "string-search" module provides basic search and replace capabilities
  upon restricted subsets of <sequence> -- primarily <vector> and
  <byte-string>.  Exploiting the known properties of these types yields
  substantially better performance than can be achieved for sequences in
  general.  

  The following functions are supplied:  

  find-first-key vector predicate? #key start end failure => key 
      Find the index of first element (after start but before end) of a
      vector which satisfies the given predicate.  If no matching element is
      found, return failure.  The defaults for start, end and failure are,
      respectively,  0, size(vector), and #f.  This function is like
      find-key, but accepts start: and end: rather than skip:.) 

  find-last-key vector predicate? #key start end failure => key 
      This is like find-first-key, but goes backward from end.  

  substring-position big pattern #key compiled start => index 
      This is a specialized version of subsequence-position which works only
      on <byte-strings>.  Since this routine only handles byte-characters and
      \== tests, it can do a "Boyer-Moore-ish" search.  (If the pattern is
      too small for B-M to pay off, substring-position will fall back upon a
      simpler search strategy -- this function should never be slower than
      subsequence-position.) 

      Note that this function takes a start: keyword (which defaults to 0)
      instead of skip:.  

      As a further optimization, you may pre-compute a "compiled" dispatch
      table for the pattern with compile-substring and pass it in (along with
      the pattern itselef) via the compiled: keyword.  This will save both
      time and space if you are searching for the same pattern repeatedly.  

  compile-substring pattern => compiled 
      Produce a skip table for Boyer-Moore-ish searching.  By splitting this
      off into a separate routine we allow people to pre-compile heavily used
      strings, thus avoiding one of the more expensive parts of the search.  

  replace-substring big pattern goal #key count compiled => result 
      Replaces all (or up to count) occurences of pattern in big with goal.
      As in substring-position all three arguments must be <byte-string>s.
      Accepts the compiled: keyword as described above.  Returns a new string
      iff it finds at least one match to replace.  

