-*- Mode: Text, Spell -*- Ideas on a replacement fasl file format. I'm not sure if this is going to be an improvement, but I want to work out all the details to make sure it will work before trying to decide if it's better or not. General idea: Instead of using a byte language, just read and fixup a large chunk. In other words, the format is one or more component consisting of: header data: scavenged unscavenged code-component itself symbol-table fixup-map permanent-constant refs. The data is just read in, and scavenged for pointers. Pointers will have the correct lowtag, but the upper bits will be the offset into the data section. Basically, we just search through the scavenged section and the code-compoent's constants section for anything with the low bit set. The fixup-map is a list of tuples. Offset is the offset in the data section, kind is the kind of fixup, and datum depends on the kind of fixup. The symbol-table is a list of symbols to intern, broken down by package. References to symbols are represented by a fixup with the datum being the index into the symbol-table. The permanent-constants are just a set of offsets into the data section of constants that might be referenced later in the fasl file. After the component is loaded, the table of permanent constants is remembered until the load finishes. References to previous permanent constants are also just another kind of fixup. The datum contains the component number and the index into that components table of permanent-constants. The component number is relative to the current component, so that fasl files can be trivially concatenated. The header contains the sizes of the above sections, plus any entry point. The entry point is just an offset into the data section of either a list or a function. If it's a function, it is funcalled. If it is a list, it is evaled. make-load-form and load-time-value: These are just handled by compiling the construction code into its own component, and placing an annotation in the header that the results of the entry point should be considered a permanent constant. Debug info: Debug info can either be just dumped as other constants, or stuck in the header. Given that fasl-files are always files, we could even back-patch the headers after everything is done. Say, have an overall header at the beginning that contains the file-wide debug info, and then each component contains the additional debug info needed for that component. The file-wide stuff would have to be constant length, so we could reserve enough space for it. Dumping details: Recurse down through all the constants determining how much descriptor and how much non-descriptor space is needed for each object. Make an entry in a hash table mapping from the object to the current offset, update the current offset with the size of the object, and then recurse. Sort the hash table entries by offset in the scavenged space, and then iterate across all of them, filling in the scavenged space. Then do the same for the non-descriptor space. While this is happening, accumulate any fixups, which are dumped later. If we don't want to do the two pass dumping, then circular references have to be handled by lseeking back into the middle of the file. Given the rarity of circular references, this might be a better idea. Cold loading: Instead of using genesis to build the initial core image, write the loader in C. Then just load all the necessary fasls into the kernel core. I guess the package system would also have to be written in C, otherwise it would be hard to intern anything before the package system existed. Or stick with genesis. The header is annotated with a maybe-cold-load vs normal-load bit. Maybe-cold-load components have to be careful to never refer to permanent constants. Mumblage: The main coding difference is where knowledge about the memory object format is represented. Specifically, the information is moved from Genesis to the dumper. As for efficiency, is seems that dumping shouldn't change that much, but loading has the potential for being much faster. Also, it might be denser [depends on the relative overheads of the fops vs the padding to dual-word alignments.] Also, the potential for ld style linking is much greater. It would be possible to annotate each component with the function names it defines, and to find the function names it references by looking at the fixups list. From this, and permanent constant definitions/references, you could build a dependency dag for a set of components. From this dag, you could copy the necessary components and ignore the others.