************************************************************************

This is a VERY rough draft of this document.  Comments appreciated.

************************************************************************


                           Scheme Datatypes

                             Chris Hanson



Scheme, like most other object oriented languages, has a set of
datatypes which form the foundation for any embedded subsystem.  The
purpose of this document is to describe the following for each
datatype in the set:

(*) its relationship to the interpreter,

(*) its relationship to other datatypes, and

(*) the operations available to manipulate instances of the type.

In addition to the datatype descriptions, there are some tools
available for manipulating typed objects in general.

[] Overview

...

Note:

Scheme contains an organizational heirarchy for types, which
associates a special object with each datatype in the system.  This
heirarchy will be discussed at length below, but for now it is useful
to know that such type objects exist and are useful.

[] Typed Objects

Most of this document will describe abstract datatypes, but this
section will briefly go over the physical datatype structure of the
system.  Each Scheme object is represented by a bit string in the
computer, which can be broken up into three parts:

(1) a type code (at least 6 bits worth, currently 8),
(2) a datum (for pointers this is a memory address), and
(3) a danger bit (used for marking by various parts of the system).

Type codes in the Scheme system are broken up into a few classes,
based on the actions taken by the garbage collector.  The class of an
object is normally referred to as the "GC class" (or gc tyupe) of the
object.  The current GC classes are (the indices correspond to the
values returned by PRIMITIVE-GC-TYPE):

(0)  Non-pointer objects, which can be represented in a single word,
(1)  Singletons (or cells), which point to a single memory location,
(2)  Pairs, which point to a pair of memory locations,
(3)  Triples, which point to a triple of memory locations,
(4)  Quadruples, which point to four memory locations,
(-2) Special objects.  Each of these is handled differently by the
garbage collector.  Most of them are for internal use of the allocator
itself,
(-3) Vectors, which point to an n-tuple of memory locations, and
(-4) Compiled entry points, which are implementation dependent.

It is quite conceivable that other classes will be implemented in the
future, e.g. quintuples.

[] Operations on Typed Objects

The following operations will manipulate typed objects directly:

(PRIMITIVE-SET-TYPE type object)
This operation will construct a new object with the same datum as
OBJECT, the given TYPE, and a cleared danger bit.  Note that it will
not return an object with a different GC class than OBJECT.

(PRIMITIVE-TYPE object)
(PRIMITIVE-DATUM object)
Returns the type/datum of OBJECT as a non-negative integer.  Currently
CScheme has a bug in that `primitive-datum' returns a negative integer
in some cases.

(PRIMITIVE-TYPE? type object)
Is a predicate which is true iff (= (PRIMITIVE-TYPE OBJECT) TYPE).
TYPE must be a non-negative integer in the appropriate range (which is
of course implementation dependent).

(MAKE-NON-POINTER-OBJECT datum)
Returns a non-pointer object with the given datum.  DATUM must be a
non-negative integer in the appropriate (implementation dependent)
range.

(OBJECT-DANGEROUS? object)
Is true iff the danger bit of the given object is set.

(MAKE-OBJECT-SAFE object)
(MAKE-OBJECT-DANGEROUS object)
Return a new object which has the same type and datum as OBJECT, but
with the danger bit cleared/set, respectively.

[] Type Table

The facilitate thinking about type codes on a more abstract level,
there are a set of operations that convert between type codes and
symbolic names for those code.

NUMBER-OF-MICROCODE-TYPES
If the implementation you are using has N bits of type code, this
number is 2^N.

(MICROCODE-TYPE name)
Returns the type code corresponding to NAME, or #F if there is no such
type.

(MICROCODE-TYPE-NAME code)
Returns the canonical name corresponding to the given type code.
There may in general be several names for the same code.

(MICROCODE-TYPE-PREDICATE name)
Returns a procedure which is true only of that type of object.

(OBJECT-TYPE object)
Similar to MICROCODE-TYPE-NAME, but extracts the type code form
object, and remaps some related microcode types to different names.

[] EQ? Hashing

A useful facility provided by Scheme is the object hashing facility.
This associates non-negative integers with objects such that two
objects are EQ? iff their hash numbers are =.  There are two sets of
operations for performing object hashing, which differ only in the
form of the numbers which are manipulated, not in the values of those
numbers.  Thus the two types of operations do not exclude one another.

The most useful fact about hash numbers is that they can be used as
"weak pointers", that is, as pointers that do not prevent the object
pointed at from being garbage-collected.  Thus is may be the case that
the object associated with a particular hash number no longer exists;
however, that hash number will never be reused for another object.

(HASH object)
Returns the hash number for OBJECT as an integer.

(UNHASH hash-number)
Takes a hash number represented as an integer and returns the object
associated with it.  If it is not a valid hash number, signals an
error.

[] Environments

The Scheme environment structure is central to any discussion of
interpretation.  It is a mutable mapping from symbols to objects which
is characterized by a block structure which is inherent in the lexical
scoping of the language.

Each Scheme environment can be thought of as a chain of frames; each
frame in the chain represents a single block in the language.  For
example, the expression

(LAMBDA (X)
  (LAMBDA (Y)
    ...))

would create two frames, the outer one containing a binding for X, the
inner a binding for Y.  The body of the inner lambda expression would
see bindings for both X and Y, but the frame containing Y would be
"the" environment frame for that expression.  The frame containing X
is called the "parent" of that frame.

There is one environment frame that is distinguished from all of the
others; this is called the "global" environment frame.  The existence
of this distinguished frame is due to an efficiency hack which will
not be discussed here.  However, it is useful to know that the value
cells for this environment are located in symbols.

The global environment (as the name implies) has no parent.  It is
also possible to have other environments without parents, but these
are in general not used.

A name is said to be "unbound" in an environment if none of the frames
in the environment has an entry for that name.  Otherwise it is said
to be "bound in the frame" in which that entry occurs.  The "binding"
for such a name is the value of the entry in the innermost frame in
which it is bound.

If a name is bound, but has no value, it is called "unassigned".  Such
bindings can be created in several different ways; see the Scheme
manual for details.  Unassigned bindings are intended to be used in
conjunction with side-effects (typically FLUID-LET).

The following operations work on any environment; ENVIRONMENT should
be a chain, and NAME should be a symbol.  The binding for NAME is
found by searching through all of the frames in ENVIRONMENT.

(LEXICAL-UNBOUND? environment name)
(LEXICAL-UNASSIGNED? environment name)
These are true iff the given name is unbound or unassigned,
respectively, in the given environment.

(LEXICAL-UNREFERENCEABLE? environment name)
This is true iff the given name is either unbound or unassigned in the
given environment.

(LEXICAL-REFERENCE environment name)
Returns the value of NAME in ENVIRONMENT.  Causes an error if the
symbol is unbound or unassigned.

(LEXICAL-ASSIGNMENT environment name object)
Modifies the value of NAME in ENVIRONMENT to OBJECT.  Causes an error
if the symbol is unbound.  This is defined to return the old value of
the variable, but usually this should not be depended on.

(LOCAL-ASSIGNMENT environment name object)
This operation defines NAME to have OBJECT as its binding within the
innermost frame of ENVIRONMENT.  If NAME already has a binding in that
frame, it is modified (whether or not it is assigned); otherwise a new
binding is created with the appropriate value.

SYSTEM-GLOBAL-ENVIRONMENT
Is the (unique) global environment; it can be compared using EQ?.

The following operations are defined on non-global environments:

ENVIRONMENT-TYPE
This is the type object for non-global environment structures.

(ENVIRONMENT? object)
Is true only of non-global environment structures.

(ENVIRONMENT-PROCEDURE environment)
Returns the procedure which was called to produce ENVIRONMENT; it is
guaranteed to be a compound procedure.

(ENVIRONMENT-HAS-PARENT? environment)
Is true iff the procedure which created ENVIRONMENT was closed in a
non-global environment.

(ENVIRONMENT-PARENT environment)
Returns the environment which ENVIRONMENT-HAS-PARENT? indicates the
existence of.

(ENVIRONMENT-BINDINGS environment)
Returns an alist of all the variables bound by the given frame.
Entries in the alist are pairs, the car of which is the (symbolic)
name of the variable, and the cadr of which is the value.  If a
variable is unassigned, the cdr of the entry will be '().

(ENVIRONMENT-ARGUMENTS environment)
Returns a list of the arguments which were passed to the procedure
which created ENVIRONMENT.

[] Procedures

Procedures can be separated into three classes in Scheme: PRIMITIVE,
COMPOUND, and COMPILED.  Primitive procedures are provided by the
implementation, and are typically built into the system at a low
level; an example is the CAR operation.  Compound procedures are
so-called because they can be broken up into two parts, a LAMBDA
expression and an ENVIRONMENT.  Compiled procedures are really
compound procedures whose lambda expressions have been compiled (into
native code), and whose environments have potentially been optimized.
Due to these optimizations, many of the procedure utilities do not
work on compiled procedures.

In some sense the distinction between primitive and compound
procedures is arbitrary, since it would be perfectly possible to
implement many primitive procedures as compound procedures.  Indeed,
it is difficult to distinguish between the two classes in a system
that is compiled rather than interpreted; compound procedures that
have been compiled differ little or not at all from primitive
procedures.

However, in this document we will mostly be concerned with the
following aspects of compound procedures, which can readily be used to
discriminate between the classes:

* Compound procedures create new environment frames.  Compiled
procedures may or may not, depending on the degree of optimization
achieved in the environment structure.  Primitive procedures never
create new frames.

* Compound procedures have SCODE definitions associated with them
which can be examined.

The following are the standard operations on procedures:

PRIMITIVE-PROCEDURE-TYPE
COMPOUND-PROCEDURE-TYPE
COMPILED-PROCEDURE-TYPE
PROCEDURE-TYPE
These type objects correspond to primitive procedures, compound
procedures, compiled procedures, and all procedures, respectively.

(PRIMITIVE-PROCEDURE? object)
(COMPOUND-PROCEDURE? object)
(COMPILED-PROCEDURE? object)
(PROCEDURE? object)
These predicates are true only of the appropriate objects.

(PROCEDURE-ENVIRONMENT procedure)
Returns the definition environment of the procedure; for primitive
procedures this is the global environment.  This operation is
undefined for compiled procedures.

(PROCEDURE-LAMBDA compound-procedure)
Returns the LAMBDA object which is the definition of the given
compound procedure.

(PROCEDURE-COMPONENTS compound-procedure receiver)
Calls RECEIVER on the LAMBDA and ENVIRONMENT of the given compound
procedure.

The description of each construct in the SCODE language consists of
several parts.

* The name of the construct, and the names of each of the
subcomponents of the construct.

* A description of the actions performed when an instance of the
construct is executed by the Scheme interpreter.

* A description of the Scheme operations which create, identify, and
manipulate such constructs.  The descriptions for these operations
follow a general rule which is rarely deviated from; we will describe
only those deviations.  Here is the template for one such type:

	FOO: BAR BAZ BLETCH

	FOO is a type with component parts BAR, BAZ and BLETCH.

	Operations:

	FOO-TYPE
	is the type object for FOO objects.

	(FOO? object)
	is the type predicate for FOO objects.

	(MAKE-FOO bar baz bletch)
	makes an instance of a FOO object.

	(FOO-COMPONENTS foo receiver)

	calls RECEIVER on the components of FOO.  That is, RECEIVER is
	a procedure of three arguments which is called on the three
	respective components of FOO.  It is always true that

	(FOO-COMPONENTS foo MAKE-FOO)

	will return a new (different) object with the same components
	as FOO.

	(FOO-BAR foo)
	(FOO-BAZ foo)
	(FOO-BLETCH foo)
	return the components of FOO individually.

* There is sometimes a short note on some implementation-specifics for
the 68000 Scheme implementation.

Note:

Within this document, we often refer to the subexpressions of an SCODE
expression as being either subproblems or reductions.  Each class of
subexpression is executed identically; the distinction between the two
classes lies in the usage of the value.  The value of a "subproblem"
is an intermediate value in the computation of the value of the
overall expression.  The value of a "reduction" is the value of the
overall expression, which is said to be "reduced to" the given
subexpression.

[] VARIABLE: NAME

Execution:

The named variable is looked up in the current environment.

Operations:

VARIABLE-TYPE
(VARIABLE? object)
(MAKE-VARIABLE name)
(VARIABLE-COMPONENTS variable receiver)
(VARIABLE-NAME variable)

Implementation:

The current interpreter system uses a incremental compilation
mechanism for variable lookup which side-effects the variable object
to contain information regarding the location of that variable within
a particular class of environments.  Variables should not be shared
between arbitrary environments.

MAKE-VARIABLE returns an uncompiled variable, which should only be
used within a particular block of a particular piece of code.

[] LAMBDA: NAME, REQUIRED, OPTIONAL, REST, AUXILIARY, DECLARATIONS, BODY

Corresponds to the LAMBDA special form.

Execution:

A PROCEDURE object is created by closing the LAMBDA expression with
the execution environment.  When the procedure is applied, a new
environment frame is created and the BODY executed within it; the
value of the body is the value of the application, so the application
is "reduced to" the body.

The standard types of parameters are all accepted: REQUIRED parameters
must be present when the procedure is applied, or an error will be
signalled.  OPTIONAL parameters may or may not be present; the REST
parameter will get any remaining arguments after the REQUIRED and
OPTIONAL parameters have been satisfied; any OPTIONAL parameters that
are not supplied will be unassigned when the procedure is applied.
AUXILIARY parameters are extra slots in the environment that are
initialized to the unassigned value.

Operations:

(MAKE-LAMBDA name required optional rest auxiliary declarations body)
Creates a new lambda object as per the given arguments.  REQUIRED,
OPTIONAL, and AUXILIARY should be lists of symbols; NAME should be a
symbol; REST should be a symbol or #F; and BODY must be an SCODE
expression.

LAMBDA-TYPE
(LAMBDA? object)
(LAMBDA-COMPONENTS lambda receiver)
(LAMBDA-BODY lambda)

(LAMBDA-BOUND lambda)
Returns a list of the names of all of the variables bound by LAMBDA.

Implementation:

Lambdas are currently implemented using 3 different SCODE forms.  The
different versions support the above fields differently.  DECLARATIONS
are not supported directly by any type, they are handled by the
abstract interface above and hidden in the body of the lambda
expressions.

EXTENDED-LAMBDA is the general case.  It supports all the options
directly except declarations.

LAMBDA is the common case.  It only supports NAME and REQUIRED
directly.  It does not support OPTIONAL, REST at all, and supports
AUXILIARY indirectly.

LEXPR is very rare.  It supports NAME and REQUIRED directly, OPTIONAL,
REST, and AUXILIARY indirectly.

[] COMBINATION: OPERATOR, OPERANDS

Execution:

The OPERATOR and the OPERANDS (an ordered sequence) are executed as
subproblems in any order, and the values collected.  Then the value of
the OPERATOR is applied to the values of the OPERANDS.

Operations:

(MAKE-COMBINATION operator operands)
COMBINATION-TYPE
(COMBINATION? object)
(COMBINATION-COMPONENTS combination receiver)
(COMBINATION-OPERATOR combination)
(COMBINATION-OPERANDS combination)

Implementation:

Combinations are currently implemented using 7 different SCODE forms:

COMBINATION is the generalized form, which is implemented as a vector
of subexpressions.

COMBINATION-1 and COMBINATION-2 are short fixed length combinations
which contain 1 and 2 operands, respectively.  They were implemented
to save code space, increase speed, and reduce execution storage.

PRIMITIVE-COMBINATION-0, PRIMITIVE-COMBINATION-1,
PRIMITIVE-COMBINATION-2, and PRIMITIVE-COMBINATION-3 are fixed length
combinations whose operators are Scheme primitive procedures.  A
primitive is a self-evaluating constant and thus requires no execution
to produce its value; this results in extra evlis-type tail-recursion.

[] CONDITIONAL: PREDICATE, CONSEQUENT, ALTERNATIVE

Corresponds directly to the IF special form.

Execution:

First, the PREDICATE subproblem is executed; if the value is false,
the conditional is reduced to the ALTERNATIVE subexpression; otherwise
it is reduced to the CONSEQUENT subexpression.

Operations:

(MAKE-CONDITIONAL predicate consequent alternative)
CONDITIONAL-TYPE
(CONDITIONAL? object)
(CONDITIONAL-COMPONENTS conditional receiver)
(CONDITIONAL-PREDICATE conditional)
(CONDITIONAL-CONSEQUENT conditional)
(CONDITIONAL-ALTERNATIVE conditional)

[] SEQUENCE: ACTIONS

Corresponds directly to the SEQUENCE special form.

Execution:

ACTIONS is an ordered set of subexpressions.  Each of the
subexpressions is executed in order.  The last subexpression is a
reduction; the others are subproblems and their values are discarded.

Operations:

(MAKE-SEQUENCE actions)
SEQUENCE-TYPE
(SEQUENCE? object)
(SEQUENCE-COMPONENTS sequence receiver)
(SEQUENCE-ACTIONS sequence)

Implementation:

Currently SEQUENCE is implemented as two SCODE forms, SEQUENCE-2 and
SEQUENCE-3, which are fixed length sequences of 2 or 3 elements.
Larger sequences are constructed by chaining these forms together.
This was done because (1) sequences are usually very short in good
Scheme code, and (2) this implementation was simpler, faster, and
required less intermediate storage than a direct implementation of
arbitrary length sequences.

[] DELAY: EXPRESSION

Corresponds directly to the DELAY special form.

Execution:

A PROMISE object is created by combining EXPRESSION with the current
environment.  When this object is forced, the expression is executed
in the environment, and the answer is memoized in the PROMISE object.

Operations:

(MAKE-DELAY expression)
DELAY-TYPE
(DELAY? object)
(DELAY-COMPONENTS delay receiver)
(DELAY-EXPRESSION delay)

[] THE-ENVIRONMENT

Corresponds directly to the THE-ENVIRONMENT special form.

Execution:

The environment in which this expression is executed is returned as
the value of the execution.

Operations:

(MAKE-THE-ENVIRONMENT)
THE-ENVIRONMENT-TYPE
(THE-ENVIRONMENT? object)

[] QUOTATION: EXPRESSION

Corresponds directly to the SCODE-QUOTE special form.

Execution:

The QUOTATION expression evaluates to its EXPRESSION component.  This
is used for quoting pieces of code.

Operations:

(MAKE-QUOTATION expression)
QUOTATION-TYPE
(QUOTATION? object)
(QUOTATION-EXPRESSION quotation)

[] COMMENT: TEXT, EXPRESSION
[] DECLARATION: TEXT, EXPRESSION

(Note: Both COMMENT and DECLARATION have the same behavior; only
COMMENT is described here.  Do a string substitution to get the
documentation for DECLARATION.)

Execution:

The COMMENT is reduced to the value of EXPRESSION.  TEXT is completely
ignored; this is for use by programs that analyze code.

Operations:

(MAKE-COMMENT text expression)
COMMENT-TYPE
(COMMENT? object)
(COMMENT-COMPONENTS comment receiver)
(COMMENT-TEXT comment)
(COMMENT-EXPRESSION comment)

(SET-COMMENT-TEXT! comment text)
(SET-COMMENT-EXPRESSION! comment expression)
Modify the component parts of COMMENT.

                           Efficiency Hacks

These SCODE expressions exists primarily to save space in programs.
In some cases they also save time.

[] DISJUNCTION: PREDICATE, ALTERNATIVE

Corresponds directly to the DISJUNCTION special form (this is the OR
special form in most other LISP implementations).

Execution:

The predicate subproblem is executed; if it has a non-false value,
that value is the result, otherwise the disjunction is reduced to the
ALTERNATIVE subexpression.

Operations:

(MAKE-DISJUNCTION predicate alternative)
DISJUNCTION-TYPE
(DISJUNCTION? object)
(DISJUNCTION-COMPONENTS disjunction receiver)
(DISJUNCTION-PREDICATE disjunction)
(DISJUNCTION-ALTERNATIVE disjunction)

[] DEFINITION: NAME, VALUE

Corresponds directly to the DEFINE special form.

Execution:

The variable NAME is defined to have as its value the result of
executing the VALUE subproblem.  This definition is added to the
current environment frame, shadowing any other definitions in the
environment.  If such a variable already exists in the current frame,
it is altered.

Operations:

(MAKE-DEFINITION name value)
DEFINITION-TYPE
(DEFINITION? object)
(DEFINITION-COMPONENTS definition receiver)
(DEFINITION-NAME definition)
(DEFINITION-VALUE definition)
