1 -*- Dictionary: design; Package: C -*- 2 Todo: 3 Design: 4 Glossary: 5 Phases: 6 IR1 CONVERSION 7 Canonical forms: 8 Inline functions: 9 Tail sets: 10 Hairy function representation: 11 IR1 representation of Catch and Unwind-Protect: 12 Block compilation: 13 LOCAL CALL ANALYSIS 14 Entry points: 15 FLOW GRAPH CANONICALIZATION 16 FLOW GRAPH SIMPLIFICATION 17 IR1 OPTIMIZE 18 Bottom-up IR1 optimizations: 19 Top-down IR1 optimizations: 20 TYPE CHECK 21 TYPE CONSTRAINT PROPAGATION 22 ENVIRONMENT ANALYSIS 23 CALL/LOOP ANALYSIS 24 GLOBAL TN ASSIGNMENT 25 LOCAL TN ASSIGNMENT 26 IR2 CONVERSION 27 Stack analysis: 28 Cleanup generation: 29 REACHING DEFINITIONS 30 LOOP INVARIANT OPTIMIZATION 31 COPY GENERATION 32 LOCAL COMMON SUBEXPRESSION ELIMINATION 33 PRELOAD GENERATION 34 LIFETIME ANALYSIS 35 Flow analysis: 36 Conflict detection: 37 PACKING 38 Scarce SB packing: 39 Load TN packing: 40 Unbounded SB packing: 41 Ranking: 42 CONTROL OPTIMIZATION 43 BRANCH DELAY 44 CODE GENERATION 45 ASSEMBLY 46 IR1 FINALIZE 47 RETARGETING 48 Storage bases and classes: 49 Type system parameterization: 50 VOP Definition: 51 Lifetime model: 52 VOP Cost model: 53 Implementation parameterizations: 54 Special-case IR2 conversion: 55 INTERPRETER INTERFACE 56 IR2 CONSISTENCY CHECKING 57 OBSERVED BUGS AND POSSIBLE FIXES Todo: Change miscop generators to emit fixup, and add fixup dumping. Make top-level functions be XEPs, and make dumper call them. Eliminate named call, and add a special "symbol-function for call" operation that is used when calling a symbol. Do type checking on the combination function continuation. Make the derived type of all function refs be plain FUNCTION? Rationalize the constants used by generators: make our own database. Add lifetime/pack support for environment-live TNs and pre-packed save TNs. Change GTN/IR2 conversion to make fixed format frames. Dump full debug info. [Just component name at first] Figure out source-map representation that can handle multiple files. Fix up top-level loop so that block compilation is optional, the right package is current, etc. Change so that each form is initially converted in a different env? Implement unknown values: figure out the representation, and do stack analysis. Implement non-local exits. [and UWP] Implement mv-call. Implement progv. Implement closures in functional reference and function entry. Implement type checking and testing. [ptype VOP annotation macro] Implement generators for lots of random stuff. Assembly code changes: mostly existing code is either flushed or unchanged. Eventually internal errors and interrupts will need to be bashed into the "escape frame" model. We will also need a little support for throw, unwind, and possibly MV-Call. Change system so that the definition cell always hold a funcallable function. Initialize function cells to the undefined function. Fix up genesis (ripping out link-table). Fix up loader. Implement compile-to-core. Fix up GC so that it understands return PCs. Write new interpreter (possibly w/o interpreted function support at first.) Write new debugger based on abstract interface to low-level stuff. Add support for unpacking to pack load TNs. Fix up type-intersection, types-intersect. Merge in Alien code. Add "Arg-Documentation" or some such field to the Functional. Have IR1-Convert-Lambda-Body set this to some appropriate string, and have %DEFMACRO override this. Dump a function type for each XEP in the entry-info structure? We probably want some way to be able to do run-time machine-sensible query of a function's protocol. Since we don't need the min/max arg info at run-time anymore, we can flush that. Probably we want to dump a list-style specifier, since this is more tractable than dumping function-type structure. (efficiency of access isn't terribly important here.) It makes sense to dump this as a list, since these should be meaningfully machine-grovelable (this isn't nearly as inefficient as for arglists, since the type names will all be present as symbols anyway; shareability is also high.) So far as dumping is concerned, we want the top-level lambda to be an entry-point, i.e. have an XEP. There is a possible complication here, since the top-level lambda would then have functions in it (the XEPs), which was previously never the case. If this is a problem, we could make the top-level lambda be it's own XEP: there is no problem, since it has no arguments. Change initial component hackery to handle many top-level and initial components. We then convert each top-level form separately, and if we are block compiling, we make cross-component references and combine. This has the advantage of interleaving eval-when processing and macroexpansion with reading. It also makes the process of separating unconnected parts of the flow graph less important/simpler, since usually the code within a top-level form is connected, and separate code will start out separate. [But consider eval-when, progn...] This would require some funny business with top-level components: we would either have to merge top-level components when we merge components under them, or we would have to allow a single non-top-level component to have multiple top-level components. [I think this would also simplify the top-level loop, since the block and non-block cases would be more similar.] When recovering from a read error, return proxy error form rather than the next form or EOF? Fix up IR1 convert methods for catch, unwind-protect. Cons up forms and IR1-convert, so that funny functions are known, etc... Let-convert XEP calls when there aren't any stray references to the XEP that might be converted. Do functions really ever have no return PC or old-cont? If so, tail call needs to be fixed. If not, some conditionals can be ripped out elsewhere. Emission order consistency checking. Sometime, jam together the lifetime post-pass and pack pre-passes into one loop over ir2-blocks, with multiple loops over each block. Somehow want to realize that a wild result type is o.k. when we have a type assertion of T. Is this a special hack for T? Note that our interpretation of a non-values type assertion is that the first value (or NIL if none) must match the assertion. Anything is subtypep T, so we don't need to check T assertions. Handling of named constants is odd. It seems that we would like to be able to fold together named and anonymous uses of the same constant. Does Common Lisp allow this? What would be the significance of entering the same Constant structure under multiple names in *free-variables*? With restricted temps, make the attempted pack order be the order specified in the restriction? Do this by making the costs 0, 1, 2... Otherwise, there is no point in having costs for temps, since costs are only used for representation selection, and all usage is known to the VOP definer, who can specify the best SCs. Deal with joining components on local call conversion during ir1 optimization. Probably can't just set Component-Reanalyze, since the function(s) need to be moved to the new component, which FIND-DFO doesn't do. Moving functions into a component doesn't actually require DFO recomputation anyway, only let conversion actually fucks with the flow graph. Can promote-to-live lose when doing full call? (passing around lots of wired TNs...) Either make the node for advanced returns be the call, or give ir2-block a back pointer. Should VOP have a back-pointer to the IR2 block? Useful for consistency checks, and might also be useful in load-tn pack. Have variants of multiple-value return that take passing locations wired to the beginning of the frame? Then we wouldn't need to squeeze out intervening crap, supposing that this was a thing we needed to do. It would also allow us to target stack values to a useful locations, instead of having push-values move them onto the stack top. Currently we never emit type checks for the values of unused continuations. Do we believe in this? Compute closure must propagate closure from sets as well as refs? [18 august 88: seems to be a genuine bug.] Recompute DFO more often so that we are sure all unreachable code is flushed? Perhaps on policy? It would be useful to know if DFO deleted any blocks (but I guess delete-block will be setting component-reoptimize). When a top-level form is broken off into a lambda, the form is in a for-value context even though the value is discarded. Named VOP temporary mechanism. Only wired temps? Mechanism in VOP for automatically emitting different VOPs depending on policy? If defining :conditional VOP, ensure that appropriate codegen-info args defined. In a :Conditional VOP, check that result type is Boolean? We are assuming this now that we set the Predicate attribute. Previously we could have :Conditional templates for non-predicates: these templates just wouldn't be used when not used as a predicate. But this probably isn't very useful. Macro for defining new template annotations and primitive-type attributes: coerce-to/from-t, move, type-check, type-test, ... Macro for defining non-vop (composite) templates (define-template?). In VOP*, do run-time checking that a legal number of arguments has been supplied. Make primitive-type return the appropriate primitive-types for all possible types. Main thing currently missing is float types. Special-case Defstruct accessors and predicates in IR1 conversion, making the calls known functions and figuring out type info. Eventually we will probably want to make the setf expansion for a slot accessor be something like (%set-slot ' ) so that we don't have to actually create named setter functions. Fix function type database. Basic type inference methods. Make sure everyone that should be marking blocks as needing to be optimized is doing so. This primarily concerns control optimizations, although there may also be missing places in IR1 optimization itself. How do we get the #' in the defun expansion to access the actual object? Perhaps have a special special form? DEFTYPE type stuff? Fix type-union and type-intersection to handle numeric types correctly. Array types also broken: (types-intersect string (simple-array * (*)) => nil (type-intersection string (simple-array * (*)) => nil Missing source transforms/canonicalization? Change function type hackery to refect the weak interpretation of function declarations. Basically we ignore function type declarations and we revert to plain FUNCTION when we union or intersect functions types. We infer assumed function types from declarations in the actual definition. [### This probably isn't exactly right. Our notion of getting argument type assertions from a "function type" just doesn't correspond to Common Lisp function types. What we do is keep our function type, but have no Common Lisp way of getting at our "function types". The complex Common Lisp FUNCTION type specifier will always turn into just plain FUNCTION. We can continue to special-case calls to functions that have a "function type" on the call continuation, since these types can only be created by magic.] Make the asserted type for all function continuations be FUNCTION? Make the derived type for functional/global-function references be FUNCTION? The purpose of this would be to cause type checking on the function in funcall (or any other call where the function type isn't known a priori). This allows us to unbundle the type checking, so that it can be optimized away or omitted according to policy. Fix up named constant handling code to always evaluate the expression at compile time, rather than only evaluating when it is known to be constant. Fix things up to correspond to the cleanup proposal. The main change is in replacing the inlinep values with a general purpose integration level. The globaldb compiler environment support also needs to be rethought. Change constant stuff to eval the value expression at compile time and flush the "unknown constant" support. Ultimate read-loop: Expand macros looking for package frobbing forms. Whizzy read-error recovery: We remember the starting position of the form that we are reading. If we get a read error, then we back up and read again with *read-suppress* on. We display some context around the place where the error happened. If we hit end of file, then display the start of the form being read. Make IR1 optimize cleverly use the Call attribute. [That is, get a worst case by combining the attributes of the actual functional args, rather than totally punting when Call is specified. Change TAGBODY (and BLOCK?) to preserve the drop-thrus in the original code. Remaining special forms: UNWIND-PROTECT, PROGV. Handle recursive types. Add IGNORABLE declaration. Make definition macros totally real by having the load-time functions deal with clearing compiler info and similar stuff. Make IR1 conversion of special optionals less pessimal. User level &more support. IR1 values type hackery, especially in mv-bind. Probably want a derive-type method for Values too... Substitute non-set let variables bound to effectless and unaffected calls of non-set lexical variables or constants, when the variable is referenced only once (and not inside a loop). We need only move the combination node, since the evaluation of such arguments is always delayed until the value is needed. This optimization should be useful for macros and inline function calls (such as transforms). PSETQ isn't propagating type assertions to the new-value forms. We either need an IR1 optimization that can discover type assertions on local call args, or we need a special-case IR1 convert method for PSETQ. Check that SCs specified for a restricted temp are a subset of the SCs allowed by the primitive type. (requires meta-compile-time primitive-type information). Give warning if an unbounded SC is allowed? Factor out non-Common-Lisp file-position hackery somehow. Design: Variable maps: There are about five things that the debugger might want to know about a variable: Name Although a lexical variable's name is "really" a symbol (package and all), in practice it doesn't seem worthwhile to require all the symbols for local variable names to be retained. There is much less VM and GC overhead for a constant string than for a symbol. (Also it is useful to be able to access gensyms in the debugger, even though they are theoretically ineffable). ID Which variable with the specified name is this? It is possible to have multiple variables with the same name in a given function. The ID is something that makes Name unique, probably a small integer. When variables aren't unique, we could make this be part of the name, e.g. "FOO#1", "FOO#2". But there are advantages to keeping this separate, since in many cases lifetime information can be used to disambiguate, making qualification unnecessary. Type When unboxed representations are in use, we must have type information to properly read and write a location. We only need to know the primitive-type for this, which would be amenable to a space-saving numeric encoding. But if we allow user modification of locations, it would be nice for the debugger to be able to type-check user modifications. It is also a useful sanity check to be able to check that variables hold values of the correct type: this could help to find bugs in code that wasn't compiled safely. For example, checking the type could be a side-effect of printing the value. [### Or no... What we really need to recover the representation is the SC. This also already has a convenient numeric encoding. Neither the primitive-type nor the actual type are enough to recover the representation, since they don't include representation decisions made by pack. There is little point in dumping the primitive-type, since it contains less information than the actual type. So we must dump the SC, and we can also dump the type if we believe the above argument about its utility.] Location Simple: the SB and offset. [Actually, we need the save location too.] Lifetime In what parts of the program does this variable hold a meaningful value? It seems prohibitive to record precise lifetime information, both in space and compiler effort, so we will have to settle for some sort of approximation. The finest granularity at which it is easy to determine liveness the the block: we can regard the variable lifetime as the set of blocks that the variable is live in. Of course, the variable may be dead (and this contain meaningless garbage) during arbitrarily large portions of the block. Note that this subsumes the notion of which function a variable belongs to. A given block is only in one function, so the function is implicit. The variable map should represent this information space-efficiently and with adequate computational efficiency. The SC and ID can be represented as small integers. Although the ID can in principle be arbitrarily large, it should be <100 in practice. The location can be represented by just the offset (a moderately small integer), since the SB is implicit in the SC. The lifetime info can be represented either as a bit-vector indexed by block numbers, or by a list of block numbers. Which is more compact depends both on the size of the component and on the number of blocks the variable is live in. In the limit of large component size, the sparse representation will be more compact, but it isn't clear where this crossover occurs. Of course, it would be possible to use both representations, choosing the more compact one on a per-variable basis. Another interesting special case is when the variable is live in only one block: this may be common enough to be worth picking off, although it is probably rarer for named variables than for TNs in general. If we dump the type, then a normal list-style type descriptor is fine: the space overhead is small, since the shareability is high. We could probably save some space by cleverly representing the var-info as parallel vectors of different types, but this would be more painful use. It seems better to just use a structure, encoding the unboxed fields in a fixnum. This way, we can pass around the structure in the debugger, perhaps even exporting it from the the low-level debugger interface. [### We need the save location too. This probably means that we need two slots of bits, since we need the save offset and save SC. Actually, we could let the save SC be implied by the normal SC, since at least currently, we always choose the same save SC for a given SC. But even so, we probably can't fit all that stuff in one fixnum without squeezing a lot, so we might as well split and record both SCs. In a localized packing scheme, we would have to dump a different var-info whenever either the main location or the save location changes. As a practical matter, the save location is less likely to change than the main location, and should never change without the main location changing. One can conceive of localized packing schemes that do saving as a special case of localized packing. If we did this, then the concept of a save location might be eliminated, but this would require major changes in the IR2 representation for call and/or lifetime info. Probably we will want saving to continue to be somewhat magical.] How about: (defstruct var-info ;; ;; This variable's name. (symbol-name of the symbol) (name nil :type simple-string) ;; ;; The SC, ID and offset, encoded as bit-fields. (bits nil :type fixnum) ;; ;; The set of blocks this variable is live in. If a bit-vector, then it has ;; a 1 when indexed by the number of a block that it is live in. If an ;; I-vector, then it lists the live block numbers. If a fixnum, then that is ;; the number of the sole live block. (lifetime nil :type (or vector fixnum)) ;; ;; The variable's type, represented as list-style type descriptor. type) Then the debug-info holds a simple-vector of all the var-info structures for that component. We might as well make it sorted alphabetically by name, so that we can binary-search to find the variable corresponding to a particular name. We need to be able to translate PCs to block numbers. This can be done by an I-Vector in the component that contains the start location of each block. The block number is the index at which we find the correct PC range. This requires that we use an emit-order block numbering distinct from the IR2-Block-Number, but that isn't any big deal. This seems space-expensive, but it isn't too bad, since it would only be a fraction of the code size if the average block length is a few words or more. An advantage of our per-block lifetime representation is that it directly supports keeping a variable in different locations when in different blocks, i.e. multi-location packing. We use a different var-info for each different packing, since the SC and offset are potentially different. The Name and ID are the same, representing the fact that it is the same variable. It is here that the ID is most significant, since the debugger could otherwise make same-name variables unique all by itself. Stack parsing: There are currently three relevant context pointers: -- The PC. The current PC is wired (implicit in the machine). A saved PC (RETURN-PC) may be anywhere in the current frame. -- The current stack context (CONT). The current CONT is wired. A saved CONT (OLD-CONT) may be anywhere in the current frame. -- The current code object (ENV). The current ENV is wired. When saved, this is extra-difficult to locate, since it is saved by the caller, and is thus at an unknown offset in OLD-CONT, rather than anywhere in the current frame. We must have all of these to parse the stack. With the proposed Debug-Function, we parse the stack (starting at the top) like this: 1] Use ENV to locate the current Debug-Info 2] Use the Debug-Info and PC to determine the current Debug-Function. 3] Use the Debug-Function to find the OLD-CONT and RETURN-PC. 4] Find the old ENV by searching up the stack for a saved code object containing the RETURN-PC. 5] Assign old ENV to ENV, OLD-CONT to CONT, RETURN-PC to PC and goto 1. If we changed the function representation so that the code and environment were a single object, then the location of the old ENV would be simplified. But we still need to represent ENV as separate from PC, since interrupts and errors can happen when the current PC isn't positioned at a valid return PC. [### We may need to be able to tell whether a call is local or not, since a local call doesn't have to save ENV. I guess we can look at the ENV, and see if the code object contains the PC: if so, we win, if not (perhaps not a code object at all), then look farther down the stack. Note that there wouldn't be any problem if we had a single-object function representation, since ENV is implicit in the RETURN-PC.] How much to we really gain by allowing the context pointers to be in arbitrary locations? It seems worthwhile allowing OLD-CONT and RETURN-PC to be in arbitrary locations in the current function, since we can then save then in registers if we don't do any calls. This can significantly speed up calls to "trivial" functions, which seems worthwhile. But when we do save these things on the stack, there is no real advantage in using arbitrary locations. It seems like it might be a good idea to save OLD-CONT, RETURN-PC and ENV at the beginning of the frame (before any stack arguments). Then we wouldn't have to search to locate ENV, and we also have a hope of parsing the stack even if it is damaged. As long as we can locate the start of some frame, we can trace the stack above that frame. We can recognize a probable frame start by scanning the stack for a code object (presumably a saved ENV). It would also be possible to parse the stack from the bottom up, given this information and also some special consideration in the escape frame format. This is because the caller is responsible for SP after the call, so the caller has to know how big its frame is. If we are guaranteed that all stuff on the stack is "inside" a frame, we can parse the stack from the bottom up by starting at the stack bottom and skipping over frames using the frame size information. We augment each Debug-Function with either a constant frame size (for a fixed size frame) or a saved SP location (for frames that receive unknown MVs). [Note that in a given component, all constant-size frames are the same size, so were it not for variable-size frames, this information could be stored in the Debug-Info structure.] [### But not really, since we can't tell what function we are running in without the return-pc, thus we can't tell the frame size. I guess one possibility would be to make the fixed/variable frame decision on a per-component rather than a per-function basis. Then in a variable-frame component, we would always store the frame end at a fixed location at the frame beginning. This is a little unpleasant, though, since any use of unknown MVs in a component would have an efficiency penalty for all calls in that component. This would hurt large block compilations. Alternately, we could guarantee things about the format of the variable size stuff so that we could recognize and skip it. For example, if what would be the OLD-CONT in a real frame is guaranteed in the variable part guaranteed to never be the current frame, then we can verify that we have found the beginning of the next frame by checking that the frame's OLD-CONT is the current frame. If assuming a fixed-size frame doesn't check out, then we must be in a variable sized frame, so we access the saved frame end to find the next frame. A related idea would be to make the variable part of the frame look like a special "values" frame. A values frame would directly incorporate the values count, allowing the the values glob to be skipped. We could indicate a values frame by putting some distinctive non-code-object thing in the ENV save location. There are probably all kinds of nasty problems with parsing the stack in the presence of interrupts, since we could be stopped while a function call is in progress. [Maybe even an NLX, although we would probably want to make that uninterruptable.] Hopefully the debugger can handle these things by some case analysis. We can definitely be interrupted during UWP cleanup code, so the stack must be left in some sort of sensible state when processing an unwind-protect. This seems like an argument in favor of squeezing out frame as we unwind, leaving only the state needed to continue the unwind on the stack. Except it is pretty hard at run-time to determine the end of the "current frame" so as to leave the return values on top of it. So maybe unwind should use the unwinder's frame to keep stuff in until the values receiver gets around to grabbing the stuff. But then there will be all these o.k. looking frames on the stack that have really been unwound: the only cue that they aren't real is that the current CONT points to the UWP frame, and the current PC is in the function associated with that frame. This means that parsing the stack from the bottom must use CONT to determine when it has hit the top of the stack. Note that we currently have a bad problem: when the compiler can prove that a function never returns normally, then it doesn't save the OLD-CONT and RETURN-PC. If something bad happened in such a function, then we wouldn't be able to parse the stack. This can happen fairly easily in system code such as the top-level R-E-P loop. There isn't any efficiency reason for not saving the context, since such calls are dynamically rare (and the function must eventually do a relatively expensive NLX). The problem is that the compiler isn't currently very good at retaining "useless" information. Probably we want some fairly general mechanism for specifying that a TN should be considered to be live for the duration of a specified environment. It would be somewhat easier to specify that the TN is live for all time, but this would become very space-inefficient in large block compilations. This mechanism could be quite useful for other debugger-related things. For example, when debuggability is important, we could make the TNs holding arguments live for the entire environment. This would guarantee that a backtrace would always get the right value (modulo setqs). Note that in this context, "environment" means the Environment structure (one per non-let function). At least according to current plans, even when we do inter-routine register allocation, the different functions will have different environments: we just "equate" the environments. So the number of live per-environment TNs is bounded by the size of a "function", and doesn't blow up in block compilation. The implementation is simple: per-environment TNs are flagged by the :Environment kind. :Environment TNs are treated the same as :Normal TNs by everyone except for lifetime/conflict analysis. An environment's TNs are also stashed in a list in the IR2-Environment structure. During during the conflict analysis post-pass, we look at each block's environment, and make all the environment's TNs always-live in that block. We can implement the "fixed save location" concept needed for lazy frame creation by allocating the save TNs as wired TNs at IR2 conversion time. We would use the new "environment lifetime" concept to specify the lifetimes of the save locations. There isn't any run-time overhead if we never get around to using the save TNs. [Pack would also have to notice TNs with pre-allocated save TNs, packing the original TN in the stack location if its FSC is the stack.] We want a standard (recognizable) format for an "escape" frame. We must make an escape frame whenever we start running another function without the current function getting a chance to save its registers. This may be due either to a truly asynchronous event such as a software interrupt, or due to an "escape" from a miscop. An escape frame marks a brief conversion to a callee-saves convention. Whenever a miscop saves registers, it should make an escape frame. This ensures that the "current" register contents can always be located by the debugger. In this case, it may be desirable to be able to indicate that only partial saving has been done. For example, we don't want to have to save all the FP registers just so that we can use a couple extra general registers. When when the debugger see an escape frame, it knows that register values are located in the escape frame's "register save" area, rather than in the normal save locations. We can mark an escape frame by having the ENV save location be some distinctive value (as proposed for values frames). The problem with this marking mechanism is that ENV is not in general initialized until someone does a call out of the frame, which means that arbitrary garbage may be in this slot in frames that are escaped from. This means that in a bottom-up parse, we can't tell whether a frame is truly a special frame, or just an open frame, unless we know whether the next frame is an escape frame. We also can't locate the next frame (to see if it is escape frame), since ENV may not have been saved yet, and we need ENV to compute the frame size. The solution seems to be to require the escape process to save bottom-up linkage information in the open frame. In particular, if escape saves ENV in the standard ENV save location, we can skip over fixed-size frames. This also eliminates the problem of open frames possibly looking like escape or values frames, since we use the ENV location to flag special frames. If we allow arbitrary variable garbage at the end of the frame, we still have a problem, since even if the SP location is can be determined from the debug info, it may not have been properly initialized. This seems to be an argument for requiring the variable stuff to be self-describing, i.e. the values-frame idea. A related possibility would be to require the MV-returner to store the end of the values glob into the returnee's frame at a standard offset (using OLD-CONT as a base). This would be a standardized SP save location that is guaranteed When the Lisp-level escape routine is called it is passed the escape frame as OLD-CONT, and a special return routine as RETURN-PC. Different return routines are used, depending on the nature of the escape. For example, in an interrupt, a return must restore all registers, whereas in a miscop bugout, we don't want to damage the argument registers, since they may have a return value in them. It would also be possible for a miscop to bug out passing a return PC that is inside the miscop, so a bugout can happen in the middle of a miscop, rather than being required to replace the miscop. Have a feature for allowing templates to take their operands in the standard argument passing locations. This would be used primarily for miscop linkage. It would easily allow arbitrary (and variable) argument miscops. Fixed arg miscops could this new mechanism, or could continue to work as now. We should definitely bring the system up without the link-table at first, since we can just rip out and ignore the link-table hair. The efficiency of symbol-function + funcall should be good enough so that the possible win of a link-table would be <10% of total system performance. Other optimizations should come first, and these could also help function call. The main example would be greater load-time smarts in referencing the global environment. Global constants (including functions) can be referenced at load time and directly incorporated into the constant pool. This could potentially be generalized to a Scheme-like "global environment", which is a somewhat link-table-like idea. An easy optimization for function call is to guarantee that the symbol definition cell always contains a callable function. We use a special (recognizable) "undefined function" when the symbol is undefined. In a call context, we can reference the cell and call the result without doing any boundp or type check. This would require that we keep macro and special-form definitions in a hashtable somewhere, but that is no big deal. We might store a different "illegal function" function in the definition cell of such things. This would mean that in the case of an undefined function error, the name wouldn't be readily available. This should be livable in the presence of good source-map information. [loop-invariant and common-subexpression would also help repeated calls to the same function.] The area of function-object representation needs much thought when we do our redesign. We want to be able to represent a function by a single "code pointer" to an object that combines the constant pool and the code: code tag fasl-file Returns a "fasl-file" object representing all state needed by the dumper. We objectify the state, since the fasdumper should be reentrant. (but could fail to be at first.) close-fasl-file fasl-file abort-p Close the specified fasl-file. fasl-dump-component component code-vector length fixups fasl-file Dump the code, constants, etc. for component. Code-Vector is a vector holding the assembled code. Length is the number of elements of Vector that are actually in use. Fixups is a list of conses (offset . fixup) describing the locations and things that need to be fixed up at load time. If the component is a top-level component, then the top-level lambda will be called after the component is loaded. load-component component code-vector length fixups Like Fasl-Dump-Component, but directly installs the code in core, running any top-level code immediately. (???) but we need some way to glue together the componenents, since we don't have a fasl table. More args need some thought. Probably we don't need more-arg cleanups, since %more-args will be implemented by moving the more args into TNs wired in our frame, thus there is no stack garbage. A more XEP will still require some song-and-dance, but not using the cleanup mechanism. Instead, we immediately explicitly insert cleanup code as an MV-prog1. Also, when there are more args, we must spit out some funny function before the call of the more EP. This will turn into code that saves CONT somewhere (in a wired register) so that it can be used as the more-arg context, then moves CONT above the args and sets SP accordingly. [But this is a special case of function-entry magic that we have to do at any XEP. Also, since the header block in an optional-dispatch XEP is never actually executed, this code must be replicated before each EP call. Probably we want to spit out some sort of %function-entry marker before each EP call, rather than just one in the header block. And we want to suppress all IR2 conversion of the XEP's bind node (other than maybe to emit some arg-count dispatching OP. Even in fixed-arg functions, we still need to set up the entry vector).] Also, if we require stack values to be left immediately on top of the caller's frame, then the more-arg-entry cleanup code will be required to BLT the return values down over the more args. But this should be an automatic consequence of the implicit MV-Prog1/Return. But the actual EP may not be called for unknown values, but, but... Glag... Also, we need to somehow bind a variable to the original CONT within the XEP so that the cleanup code can restore it. This could be a LET. Figure out when it is actually an optimization to move register saving to writers. Current heuristic of doing it whenever there is a single writer loses in some cases (such as TAK) where the writes may be executed without the call happening. Probably the cleverest thing to do is to be more conservative about the motion, trying to move to some intermediate place that is provably good. This could be done using dominator info. If there is some save that dominates other saves and is dominated by the only write, then flush the unnecessary saves. Note a restore can also be flushed when there is no reference to the register before a following restore. Possibly this could be integrated with the local packing algorithm. A TN could be packed on the stack within an inner extent, and still be in a register outside. We want to identify single-entry extents with finer granularity than loops. Traces perhaps? Anyway, this is all way down the road. Whizzy error system interface for errors in compiled code. Call a miscop as now, but the miscop saves registers and bugs out to the out-of-line version of the function. If the function returns, then the miscop returns with the function's value. We target the operands to the miscop passing locations, but postpone doing the moves into the error code. We don't try to share error code, since each distinct error can have different operands, and needs to jump back. But type errors can't use this mechanism, since type checks for open-coded functions are factored out. The safe (miscop) versions could choose to signal errors this way, but this is out of our hands. Dumping: Dump code for each component after compiling that component, but defer dumping of other stuff. We do the fixups on the code vectors, and accumulate them in the table. We have to grovel the constants for each component after compiling that component so that we can fix up load-time constants. Load-time constants are values needed my the code that are computed after code generation/assembly time. Since the code is fixed at this point, load-time constants are always represented as non-immediate constants in the constant pool. A load-time constant is distinguished by being a cons (Kind . What), instead of a Constant leaf. Kind is a keyword indicating how the constant is computed, and What is some context. Some interesting load-time constants: (:label .