Program representation data structures: A BUFFER is a stand-alone chunk of program: an actual user-level buffer, a cut buffer, etc. It has three parallel levels of structure, each represented as sorted vectors in a buffer-gap representation: 1] Text: the text representation of the stuff being edited. This is pretty much as in a text editor, but automatic indentation and line breaks might not have any corresponding characters. 2] TOKENs & MARKs. Marks are either fixed or moving (moved after insertions.) Line-breaks (automatic or explicit) are represented by moving newline marks. A token is a subtype of fixed mark. Marks just have a position in the text sequence, whereas tokens have a position and count. Tokens do not overlap, so each text position corresponds to at most one token (but tokens can have marks inside them.) 3] REGIONs of text in a buffer with some semantic affinity: the definition of a single method, a leaf in a document outline hierarchy. Regions are also disjoint. Each region has a prefix token which contains some non-empty but arbitrarily region-specific text such as "section foo" or perhaps ^L. The purpose of the prefix is to allow text character positions to be unambiguously assigned to a region, and to provide a textual way to describe deleting a region seperator (hence merging with the previous region.) We use the usual hack of having marks (and tokens) after the gap have the absloute post-gap index rather than the logical position (so that we don't have to update them on each insertion/deletion at the gap. All marks between the old and new gap must be relocated when the text gap moves. References to tokens, marks and regions are via pointers, but regions and marks contain their index in their buffer-vector so that the next & previous could be located without doing a binary search. These indices would need to be relocated when the mark & region gaps move. Probably for simplicity all gaps would move simultaneously. Marks, tokens and regions are specializable objects with methods that can exert control over redisplay and handling of mouse-clicks & key-events. For example, tokens and regions can be read-only, and tokens can be displayed in a particular font or color. Modification: Text: Basic editing operations are described in terms of textual insertions and deletions, as in a text editor. The undo history is maintained as these textual modifications. A buffer, string or character may be inserted at a mark. The text between two marks may be deleted, or new buffer can be created by cutting or copying the text. Tokens: When all the text for a token is deleted, cut or copied, that token is deleted, cut or copied. If the text within a token is modified via an insertion or deletion, or a partial token is copied due to a cut or copy, then, CHANGE tokens are created as needed. A CHANGE token holds the changed text and the tokens that previously came before and after the changed subsequence (possibly the same, possibly NIL.) These old adjacent tokens are no longer part of the token sequence, but the tokenizer can examine them to preserve any semantic annotations. An exisiting CHANGE token is extended to include contiguous string or character insertions. Buffer insertions insert the tokens associated with that buffer. Conversion of CHANGE tokens into into actual language syntax tokens is done on demand by a tokenizer associated with the REGION. The region has an automatically updated list of change tokens that need to be tokenized. Redisplay is one form of demand for tokenization, since token type affects display. For efficient implementation of keystroke insertion while supporting completion, etc., an ACTIVE token (of class determined by the region) is created at the gap whenver the gap is moved (after the modification that caused it to be moved.) When the ACTIVE token is created, it can extend itself to include adjacent text. The ACTIVE token is extended & contracted to include subsequent character insertions and deletions, but becomes an ordinary CHANGE token if a buffer is inserted or the gap moves. Each buffer has one ACTIVE token. Because of the semantics of ACTIVE, there is a SET-ACTIVE operation which can move the ACTIVE token without actually doing modification (to make it track the cursor.) REGIONS: REGIONs provide the semantic context for editing. They contain state analogous to EMACS modes (i.e. is this code or documentation.) REGIONS also contain modification information: the CHANGE tokens and a generic function called whenever the region is modified (the default method increments a counter.) Regions control how text is is tokenized (and hence how it is displayed.) Regions also exert influence over command processing by providing local menus and keystroke interpretation. Regions may incorporate arbitrary hierarchical and graph-structured references to other regions and data structures, but the region is responsible for maintaining the consistency of these annorations (by using the modification information.)