-*- Dictionary: docs:internals/design -*- Normally, slots could be stored inline in the instance, instead of indirecting to a vector. If is instance is redefined to have more slots, then we indirect to the new version of the instance. This saves space, and also saves an indirection if the compiler knows (from declarations or inferences) that the instance has a particular layout, and has not been redefined. When GC transports an indirect instance, it can remove the indirection. The "wrapper" slots would become standard leader slots on all class objects (including structure-classes). If we want to change the "wrapper", then we invoke the instance redefinition mechanism on the class object. The old "wrapper" is in the original object, and the new wrapper is in the indirected-to object. This eliminates the need to indirect the wrapper to get the class. The instance overhead would be the same as our current structure overhead: 2 words. Instance objects would aways need at least one slot to point to an indirect instance. Not a big constraint... In fact, we would presumably always allocate slots in 2's for dual-word alignment. So an instance would always have at least 2 slots. built-in-wrapper-of can be implemented by using the type code to index a global vector table. compile stuff with efficiency notes. vector, cache, methods, dfun? Use threaded code to get quick turnaround and reasonably good speed. Native compilation will not always be necessary. By threaded code, I mean a vector of closures with some state passing in as args (FP, PC). The function will do its bit of stuff, and then tail-call the function at the next PC with the updated state. The general interpreter would be greatly speeded up by using this threaded implementation, with little increase in preprocessing time. We could also add interpreted code type checking at this time. We would also sometimes want to optimize interpreted code before conversion. Not all of the functions need to be closures, for example one might pick up the two stack top values, add them and push the result. Closures would be used for things such as conditionals (to hold the branch targets). Speed can be increased by having operations that do multiple pushes, moves, pops, etc. Different closures will deal with calling with different number of args, avoiding apply for small numbers of args. For debug-info, we might do something like having a parallel vector holding the source-path which generated that code (or similar.) And we would need variable map info, perhaps already parsed into the debug-info format. We can also use this sort of technique for generating method combination functions and similar stuff on the fly for the object system. Probably much of the stuff can be done with simple closures, and no vector+pc. In general, though, it seems that some things such as method combination have a combinatorial explosion if you can't generate things on the fly that take arbitrary numbers of args, have arbitrarily many clauses, etc. We still usually wouldn't have to cons up code and use the normal interpreter --- we could instead cons up our own special purpose threaded interpreter. A technique related to threaded code that might also be useful for OO stuff would be to use "thunks". This probably pertains mainly to IV access, and perhaps explicit type testing. The idea is that for general case code, we can always call some function found in our lexenv. If the nature of a variable changes, this function can be clobbered with one that computes the variable in a new way. The thunk can itself be a closure consed up from a general template. This is really just an easy way to produce methods with specialized IV access as well as specialized GF calls. This could also be used in combination with "inline caching" or "method splitting", where we have some inline conditional that tests for the good case and does it, or does the thunk call in general. The inline cache could be invalidated by the presence of a valid thunk, i.e. the thunk could be initially NIL, and we only call it if it is set to a function. This internal conditionalization is mainly interesting for non-argument (or at least non-specialized) values, since for arguments, we can produce totally separate good-case and general-case methods. If we end up having to use the general case method, then we might want to use heuristics to see if we should compile a new special-case method on the fly. For example, each general case method could have a call counter and a previous check time, and when the counter hits 0, we see how long it took to get N calls. If this interval is too short, recompilation is called for. This sort of profiling info can also be used for more batch-oriented operations, such as file compilation using in-core info, or compiling the in-core program into a tense deliverable. Somewhat more specific profiling would be better for the compiler's use, and wouldn't really be much more expansive to collect. We could keep per-call-site counters. This would give us direct information about which arcs in the call graph are important, instead of making us guess this from callers's call counts. Note that there are various speed/space/latency tradeoffs here: -- Compile only the general case, and blow off the syntaxed form. -- Compile only special cases, and: recompile for new cases run new cases interpreted run new cases interpreted, but compile if called a lot require that no new cases appear, and blow off syntaxed form -- Compile special cases and the general case. -- When compiling special cases, we can compile: all currently possible special cases only those cases that have been used cases that "seem" likely, etc. The "only special cases, run new cases interpreted, compile if called a lot" seems like it might be good for active development, since it: -- keeps program size down -- provides the advantages of having the syntaxed form -- gives redefinition with minimal latency (interpreted code) -- gives maximum speed when needed (specialized compiled code) Basically, it gives optimum speed with good space, and only pays a latency hit when there is a speed payoff. And the syntaxed form gives good "browseability". Also, having the syntaxed form allows the possibility of compiling down to a tense deliverable form. In delivery mode, the "only special cases, blow off syntaxed form" option is attractive, since it gives maximum speed with minimal space, and a guaranteed small latency. In delivery mode, the "only used special cases" option is a good optimization, although not necessarily correctness preserving. Note that in any case, we don't really have to compile an exponential number of multi-methods, since many elements of the cross-product will really end up with the same code. Applications that actually dick with class/method definitions at run-time will probably want to have general-case methods around. For some stuff like optimizing constructors, and perhaps other MOP interactions, we would like to compile code assuming that particular applicable methods will be ones supplied by the implementation. This is really just a form of inline method expansion. If we know the type of a value, we can inline expand methods at compile time, but we have to be prepared to recover if new methods are defined at run-time. This requires inline tests and side-effect analysis, since a method could be defined while a specialized method is running. Note however, that for constructors we can do gazillions of times better than PCL just by using thunks (i.e. indirection to a specialized constructor.) Compiling the in-core program into a tense specialized form seems like it would be a win in terms of semantics, since it is known a priori that the runtime environment will be the same as the compile-time environment. And we can ignore the difficulties involved with matching source-file entities up with in-core entities (e.g. with anonymous functions) when we are determining what relationships hold in the status quo. This would be used in conjuction with an "unloading" or incremental save facility. It would also be possible to automatically determine that redefinitions are impossible due to defining operations not being referenced in the program, though I'm not sure that is really very important. [### I suppose in practice it wouldn't be too difficult to match up anonymous functions with their in-core definitions. If the source-path of an anonymous function in IR1 uniquely defines it within the named definition it is in, then we can use the in-core anonymous function within that definition that has the same source-path. Gleh... Not nearly as clean.] Note that "compiling down" for delivery might actually be quite a bit faster than total recompilation if the compiler makes good guesses about what specializations to compile. This could be seen as a load-time version compile-combined-flavor-methods, etc. Basically, the compiler gets another chance to specialize things at load time, potentially considering profiling information and used specializations. Syntaxing: These threaded code representations are distinct from the "syntaxed" representation, which is a compact byte-coded representation of the IR1 after optimization (but before any assumptions are made about method definitions, user inline functions, etc.) This represents all of the name references for dependency control, and can be specialized and respecialized when the environment changes. When respecializing, there are efficiency advantages to not having to IR1-convert from source and reoptimize, but it seems there are also important semantic reasons. We can't really reliably respecialize a method from its source (especially not the original source), since macros can hide the actual definition, and we can't really randomly re-evaluate top-level forms. There is also a certain niceness to knowing that syntactic information is bound in at one point. You probably could do it from some sort of macroexpanded source (i.e. the actual lambda-list), but this forefits the other advantages. Note also the advantages for consistency maintenance and code browsing of having a representation that declaratively represents what the method truly actually does. This can give you really bulletproof consistency guarantees. Think of this as a trend in the hypercode direction.