implementation-notes
Robert Hieb & Kent Dybvig
90/10/08
modified 91/02/01

The basic notion underlying the implementation is that it ought to be
possible to rename bound identifiers as long as doing so does not
capture free uses of other identifiers (alpha conversion).  Thus the
expander, upon reaching a binding expression such as "lambda" or
"let-syntax", ought to be able to generate a new identifier and
substitute it for all instances of the old identifier within the scope
of the binding.  Note that this process need not distinguish nested
bindings for the same identifier since they too can be renamed.

Two difficulties arise:

   1. An identifier may be used as both as a bound identifier (variable
   or keyword) and as data.  In the latter case the identifier will
   eventually end up as part of a "quote" expression, but, since
   "quote" itself may be rebound and since expressions may expand into
   "quote" expressions, it is not generally possible to determine
   whether an identifier is being used as symbolic data until the
   expansion process actually descends to the level of its use.
   Consequently, renaming may be premature.

   2. Macros introduce new identifiers.  To maintain hygiene, these
   identifiers must be distinguished from program identifiers.  The
   macro may introduce a bound instance of a given identifier whose
   scope shadows program identifiers, or it may insert a free instance
   of an identifier into the scope of a bound identifier from the
   program.  Since the expander cannot determine whether an expression
   is creating a binding until it actually reaches it in the expansion
   process, renaming may happen too late to prevent unwanted
   shadowing.

To overcome these difficulties, identifiers are converted to triples
consisting of:

  1. the original symbolic name,

  2. the current binding name, and

  3. a set of marks.

The original name is used if an identifier turns out to be symbolic
data.  The binding name is used to resolve identifier bindings.  The
marks are used to distinguish between identifiers introduced at
different stages of the expansion process.  These marks serve the same
purpose as Kohlbecker et al. ``time stamps'' or Clinger ``colors.''
The original name also provides useful debugging information.

Following is a description of a simplified operational semantics using
identifier triples.  It ignores the issues of ``definitions,'' keyword
or variable, internal or global.

The expander takes a syntactic expression and two environments and
returns an expanded program.  The syntactic expression is an expression
in which all symbols have been replaced with identifier triples.
Initially, an identifier's binding name is the same as its original
name and its mark set is empty.  One environment is a keyword
environment that maps identifier names to transformation procedures.
The other environment is a variable environment that can be just a set
of variable names (no map is necessary).  The two environments must be
disjoint.  The variable environment is assumed to contain all valid
variables.

When a binding expression is encountered, the bound identifiers are
renamed within the scope of the bindings.  This renaming process
consists of finding all identifiers that have the same binding name and
the same set of marks and replacing their binding name with a new
name.  The marks can be replaced with an empty set, since they are now
redundant.  The new names must be distinct from any other names in the
program, including names that can be generated by macros.  The various
binding expressions are handled as follows:

   If the binding is a "lambda" expression, the identifiers in the
   formal parameter list are replaced with the new names and the body
   is expanded using the current keyword environment and a variable
   environment extended with the new names.  The result of the
   expansion is a new "lambda" expression.

   If the binding is a "let-syntax" expression, the binding values are
   expanded in the current keyword environment and a standard
   transformer variable environment and converted into transformation
   procedures.  Since the standard transformer variable environment
   will not contain bindings for variables bound in the current
   variable environment, attempts to use such variables (other than as
   symbolic or syntactic data) will fail.  The body is expanded using
   the current variable environment and a keyword environment extended
   with the new names bound to the transformation procedures.  The
   result of the expansion is the new body.

   If the binding is a "letrec-syntax" expression, the binding values
   are expanded in the current keyword environment and a standard
   transformer variable environment and converted into transformation
   procedures.  Since the standard transformer variable environment
   will not contain bindings for variables bound in the current
   variable environment, attempts to use such variables (other than as
   symbolic or syntactic data) will fail.  Since the binding values for
   a "letrec-syntax" expression are in its scope, the value expressions
   are renamed before they are expanded.  However, the environments
   used to expand the values do not contain bindings for the new names,
   so attempts to use the keywords (other than as symbolic or syntactic
   data) in the values will fail.  The body is expanded using the
   current variable environment and a keyword environment extended with
   the new names and transformation procedures.  The result of the
   expansion is the new body.

An identifier reference is looked up in the variable environment.  If
it is present it is replaced with its binding name.  If it is not
present, then either a keyword is being used improperly (readily
determined by looking it up in the keyword environment) or the
identifier is being referenced in an invalid context.  This can
result from:

   1) using an unbound identifier,

   2) using a program variable in a transformer,

   3) using a keyword from a "letrec-syntax" binding during the
   expansion of one of its transformers, or

   4) using a identifier that a transformation procedure has moved
   outside the scope of its original binding.

A "quote" expression is replaced with a new "quote" expression in which
all internal identifiers have been replaced with their original
symbolic names.

A "syntax" expression is replaced with a "quote" expression that
still contains identifier triples.

A simple (unquoted) literal datum (number, boolean, string or
character) is returned unchanged.

A symbol (as opposed to an identifier triple) should never be
encountered by the expander outside of a quoted expression.  Such a
symbol can appear only if it has been inserted by a transformation
procedure.  Since the proper binding for such a symbol cannot be
determined, it is invalid as a syntax element.  Transformation
procedures must introduce identifiers as part of a "syntax" expression
unless they will end up internal to a "quote" expression.

When the first subexpression in a form is an identifier bound in the
keyword environment, the transformation procedure it is bound to is
used to rewrite the form.  Before the form is passed to the
transformation procedure a temporary mark is added to internal
identifiers so that old identifiers can be distinguished from new
identifiers.  The result returned by the transformation procedure is
traversed to remove the temporary mark from old identifiers and to add
a permanent mark to new identifiers.  These marks must be distinct from
any currently in use.  Finally the new expression is reexpanded in the
current environments.

The proper subexpressions of applications, assignments and conditionals
are expanded in the current environments and recombined.  The assigned
variable in an assignment statement is expanded like a variable
reference.

Based on this operational semantics, here are definitions for the
identifier operations available to the macro writer:

   (identifier? x) is true if "x" is an identifier triple.

   (free-identifier=? i j) is true if "i" and "j" are identifiers
   with the same binding name.

   (bound-identifier=? i j) is true if "i" and "j" are identifiers
   with the same binding name and the same set of marks.

   (identifier->symbol i) returns the original name of "i".

   (generate-identifier s) returns an identifier triple using "s" as
   the original name, an arbitrary set of marks, and a new (unique)
   binding name.

This substitution-based operational semantics is not suitable for a
practical implementation because of the exponential cost of actually
performing all of the substitutions.  However, it can serve as the
basis for a practical implementation by using a delayed substitution
strategy.

Instead of making a complete substitution pass when an identifier is
bound or a macro is expanded, the information necessary to perform the
substitution is bundled up with the target expression.  When it is
necessary to examine an expression, the substitution information must
be pushed down to its subexpressions.  Eventually this information will
reach an identifier, in which case the substitution can be performed,
or a simple datum, in which case the substitution information can be
thrown away.

Thus the necessity for "unwrap-syntax".  Before a transformation
procedure (or the expander) can examine or rebuild an expression, it
must first unwrap it to expose the subexpressions that it needs.  Since
it needs to unwrap the expressions only as far as it is going to
descend into them, the time complexity of macro expansion is increased
by only a constant factor as a result of the overhead of unwrapping.
