-*- Dictionary: docs:internals/design -*-

Normally, slots could be stored inline in the instance, instead of indirecting
to a vector.  If is instance is redefined to have more slots, then we indirect
to the new version of the instance.  This saves space, and also saves an
indirection if the compiler knows (from declarations or inferences) that the
instance has a particular layout, and has not been redefined.  When GC
transports an indirect instance, it can remove the indirection.  

The "wrapper" slots would become standard leader slots on all class objects
(including structure-classes).  If we want to change the "wrapper", then we
invoke the instance redefinition mechanism on the class object.  The old
"wrapper" is in the original object, and the new wrapper is in the
indirected-to object.  This eliminates the need to indirect the wrapper to get
the class.

The instance overhead would be the same as our current structure overhead: 2
words.  Instance objects would aways need at least one slot to point to an
indirect instance.  Not a big constraint...  In fact, we would presumably
always allocate slots in 2's for dual-word alignment.  So an instance would
always have at least 2 slots.

built-in-wrapper-of can be implemented by using the type code to index a
global vector table.

compile stuff with efficiency notes.  vector, cache, methods, dfun?


Use threaded code to get quick turnaround and reasonably good speed.  Native
compilation will not always be necessary.  By threaded code, I mean a vector of
closures with some state passing in as args (FP, PC).  The function will do its
bit of stuff, and then tail-call the function at the next PC with the updated
state.

The general interpreter would be greatly speeded up by using this threaded
implementation, with little increase in preprocessing time.  We could also add
interpreted code type checking at this time.  We would also sometimes want to
optimize interpreted code before conversion.  Not all of the functions need to
be closures, for example one might pick up the two stack top values, add them
and push the result.  Closures would be used for things such as conditionals
(to hold the branch targets).  Speed can be increased by having operations that
do multiple pushes, moves, pops, etc.  Different closures will deal with
calling with different number of args, avoiding apply for small numbers of
args.

For debug-info, we might do something like having a parallel vector holding
the source-path which generated that code (or similar.)  And we would need
variable map info, perhaps already parsed into the debug-info format.

We can also use this sort of technique for generating method combination
functions and similar stuff on the fly for the object system.  Probably much of
the stuff can be done with simple closures, and no vector+pc.  In general,
though, it seems that some things such as method combination have a
combinatorial explosion if you can't generate things on the fly that take
arbitrary numbers of args, have arbitrarily many clauses, etc.  We still
usually wouldn't have to cons up code and use the normal interpreter --- we
could instead cons up our own special purpose threaded interpreter.

A technique related to threaded code that might also be useful for OO stuff
would be to use "thunks".  This probably pertains mainly to IV access, and
perhaps explicit type testing.  The idea is that for general case code, we can
always call some function found in our lexenv.  If the nature of a variable
changes, this function can be clobbered with one that computes the variable in
a new way.  The thunk can itself be a closure consed up from a general
template.

This is really just an easy way to produce methods with specialized IV access
as well as specialized GF calls.  This could also be used in combination with
"inline caching" or "method splitting", where we have some inline conditional
that tests for the good case and does it, or does the thunk call in general.
The inline cache could be invalidated by the presence of a valid thunk, i.e.
the thunk could be initially NIL, and we only call it if it is set to a
function.

This internal conditionalization is mainly interesting for non-argument (or at
least non-specialized) values, since for arguments, we can produce totally
separate good-case and general-case methods.

If we end up having to use the general case method, then we might want to use
heuristics to see if we should compile a new special-case method on the fly.
For example, each general case method could have a call counter and a previous
check time, and when the counter hits 0, we see how long it took to get N
calls.  If this interval is too short, recompilation is called for.
This sort of profiling info can also be used for more batch-oriented
operations, such as file compilation using in-core info, or compiling the
in-core program into a tense deliverable.

Somewhat more specific profiling would be better for the compiler's use, and
wouldn't really be much more expansive to collect.  We could keep per-call-site
counters.  This would give us direct information about which arcs in the call
graph are important, instead of making us guess this from callers's call
counts.


Note that there are various speed/space/latency tradeoffs here:
 -- Compile only the general case, and blow off the syntaxed form.
 -- Compile only special cases, and:
        recompile for new cases
	run new cases interpreted
	run new cases interpreted, but compile if called a lot
	require that no new cases appear, and blow off syntaxed form
 -- Compile special cases and the general case.
 -- When compiling special cases, we can compile:
	all currently possible special cases
	only those cases that have been used
	cases that "seem" likely, etc.

The "only special cases, run new cases interpreted, compile if called a lot"
seems like it might be good for active development, since it:
 -- keeps program size down
 -- provides the advantages of having the syntaxed form
 -- gives redefinition with minimal latency (interpreted code)
 -- gives maximum speed when needed (specialized compiled code)
Basically, it gives optimum speed with good space, and only pays a latency hit
when there is a speed payoff.  And the syntaxed form gives good
"browseability".  Also, having the syntaxed form allows the possibility of
compiling down to a tense deliverable form.

In delivery mode, the "only special cases, blow off syntaxed form" option is
attractive, since it gives maximum speed with minimal space, and a guaranteed
small latency.

In delivery mode, the "only used special cases" option is a good optimization,
although not necessarily correctness preserving.  Note that in any case, we
don't really have to compile an exponential number of multi-methods, since many
elements of the cross-product will really end up with the same code.

Applications that actually dick with class/method definitions at run-time will
probably want to have general-case methods around.

For some stuff like optimizing constructors, and perhaps other MOP
interactions, we would like to compile code assuming that particular applicable
methods will be ones supplied by the implementation.  This is really just a
form of inline method expansion.  If we know the type of a value, we can inline
expand methods at compile time, but we have to be prepared to recover if new
methods are defined at run-time.  This requires inline tests and side-effect
analysis, since a method could be defined while a specialized method is
running.  Note however, that for constructors we can do gazillions of times
better than PCL just by using thunks (i.e. indirection to a specialized
constructor.)

Compiling the in-core program into a tense specialized form seems like it would
be a win in terms of semantics, since it is known a priori that the runtime
environment will be the same as the compile-time environment.  And we can
ignore the difficulties involved with matching source-file entities up with
in-core entities (e.g. with anonymous functions) when we are determining what
relationships hold in the status quo.  This would be used in conjuction with an
"unloading" or incremental save facility.  It would also be possible to
automatically determine that redefinitions are impossible due to defining
operations not being referenced in the program, though I'm not sure that is
really very important.

[### I suppose in practice it wouldn't be too difficult to match up anonymous
functions with their in-core definitions.  If the source-path of an anonymous
function in IR1 uniquely defines it within the named definition it is in, then
we can use the in-core anonymous function within that definition that has the
same source-path.  Gleh...  Not nearly as clean.]

Note that "compiling down" for delivery might actually be quite a bit faster
than total recompilation if the compiler makes good guesses about what
specializations to compile.  This could be seen as a load-time version
compile-combined-flavor-methods, etc.  Basically, the compiler gets another
chance to specialize things at load time, potentially considering profiling
information and used specializations.


Syntaxing:

These threaded code representations are distinct from the "syntaxed"
representation, which is a compact byte-coded representation of the IR1 after
optimization (but before any assumptions are made about method definitions,
user inline functions, etc.)  This represents all of the name references for
dependency control, and can be specialized and respecialized when the
environment changes.  

When respecializing, there are efficiency advantages to not having to
IR1-convert from source and reoptimize, but it seems there are also important
semantic reasons.  We can't really reliably respecialize a method from its
source (especially not the original source), since macros can hide the actual
definition, and we can't really randomly re-evaluate top-level forms.  There is
also a certain niceness to knowing that syntactic information is bound in at
one point.  You probably could do it from some sort of macroexpanded source
(i.e. the actual lambda-list), but this forefits the other advantages.

Note also the advantages for consistency maintenance and code browsing of
having a representation that declaratively represents what the method truly
actually does.  This can give you really bulletproof consistency guarantees.
Think of this as a trend in the hypercode direction.