Program representation data structures:

A BUFFER is a stand-alone chunk of program: an actual user-level buffer, a cut
buffer, etc.  It has three parallel levels of structure, each represented as
sorted vectors in a buffer-gap representation:
 1] Text: the text representation of the stuff being edited.  This is pretty
    much as in a text editor, but automatic indentation and line breaks might
    not have any corresponding characters.
 2] TOKENs & MARKs.  Marks are either fixed or moving (moved after insertions.)
    Line-breaks (automatic or explicit) are represented by moving newline
    marks.  A token is a subtype of fixed mark.  Marks just have a position in
    the text sequence, whereas tokens have a position and count.  Tokens do not
    overlap, so each text position corresponds to at most one token (but tokens
    can have marks inside them.)
 3] REGIONs of text in a buffer with some semantic affinity:
    the definition of a single method, a leaf in a document outline hierarchy.
    Regions are also disjoint.  Each region has a prefix token which contains
    some non-empty but arbitrarily region-specific text such as "section foo"
    or perhaps ^L.  The purpose of the prefix is to allow text character
    positions to be unambiguously assigned to a region, and to provide a
    textual way to describe deleting a region seperator (hence merging with the
    previous region.)

We use the usual hack of having marks (and tokens) after the gap have the
absloute post-gap index rather than the logical position (so that we don't have
to update them on each insertion/deletion at the gap.  All marks between the
old and new gap must be relocated when the text gap moves.  References to
tokens, marks and regions are via pointers, but regions and marks contain their
index in their buffer-vector so that the next & previous could be located
without doing a binary search.  These indices would need to be relocated when
the mark & region gaps move.  Probably for simplicity all gaps would move
simultaneously.

Marks, tokens and regions are specializable objects with methods that can exert
control over redisplay and handling of mouse-clicks & key-events.  For example,
tokens and regions can be read-only, and tokens can be displayed in a
particular font or color.


Modification:

Text:
    Basic editing operations are described in terms of textual insertions and
    deletions, as in a text editor.  The undo history is maintained as these
    textual modifications.  A buffer, string or character may be inserted at a
    mark.  The text between two marks may be deleted, or new buffer can be
    created by cutting or copying the text.

Tokens:
    When all the text for a token is deleted, cut or copied, that token is
    deleted, cut or copied.  If the text within a token is modified via an
    insertion or deletion, or a partial token is copied due to a cut or copy,
    then, CHANGE tokens are created as needed.  A CHANGE token holds the
    changed text and the tokens that previously came before and after the
    changed subsequence (possibly the same, possibly NIL.)  These old adjacent
    tokens are no longer part of the token sequence, but the tokenizer can
    examine them to preserve any semantic annotations.

    An exisiting CHANGE token is extended to include contiguous string or
    character insertions.  Buffer insertions insert the tokens associated with
    that buffer.

    Conversion of CHANGE tokens into into actual language syntax tokens is done
    on demand by a tokenizer associated with the REGION.  The region has an
    automatically updated list of change tokens that need to be tokenized.
    Redisplay is one form of demand for tokenization, since token type affects
    display.

    For efficient implementation of keystroke insertion while supporting
    completion, etc., an ACTIVE token (of class determined by the region) is
    created at the gap whenver the gap is moved (after the modification that
    caused it to be moved.)  When the ACTIVE token is created, it can extend
    itself to include adjacent text.  The ACTIVE token is extended & contracted
    to include subsequent character insertions and deletions, but becomes an
    ordinary CHANGE token if a buffer is inserted or the gap moves.  Each
    buffer has one ACTIVE token.  Because of the semantics of ACTIVE, there is
    a SET-ACTIVE operation which can move the ACTIVE token without actually
    doing modification (to make it track the cursor.)

REGIONS:

    REGIONs provide the semantic context for editing.  They contain state
    analogous to EMACS modes (i.e. is this code or documentation.)  REGIONS
    also contain modification information: the CHANGE tokens and a generic
    function called whenever the region is modified (the default method
    increments a counter.)  Regions control how text is is tokenized (and hence
    how it is displayed.)  Regions also exert influence over command processing
    by providing local menus and keystroke interpretation.

    Regions may incorporate arbitrary hierarchical and graph-structured
    references to other regions and data structures, but the region is
    responsible for maintaining the consistency of these annorations (by using
    the modification information.)