-*- Mode: Text, Spell -*-

Ideas on a replacement fasl file format.  I'm not sure if this is going to
be an improvement, but I want to work out all the details to make sure it
will work before trying to decide if it's better or not.

General idea: Instead of using a byte language, just read and fixup a large
chunk.

In other words, the format is one or more component consisting of:

	header
	data:
	    scavenged
	    unscavenged
	    code-component itself
	symbol-table
	fixup-map
	permanent-constant refs.

The data is just read in, and scavenged for pointers.  Pointers will have
the correct lowtag, but the upper bits will be the offset into the data
section.  Basically, we just search through the scavenged section and the
code-compoent's constants section for anything with the low bit set.

The fixup-map is a list of <offset,kind,datum> tuples.  Offset is the
offset in the data section, kind is the kind of fixup, and datum depends on
the kind of fixup.

The symbol-table is a list of symbols to intern, broken down by package.
References to symbols are represented by a fixup with the datum being the
index into the symbol-table.

The permanent-constants are just a set of offsets into the data section of
constants that might be referenced later in the fasl file.  After the
component is loaded, the table of permanent constants is remembered until
the load finishes.  References to previous permanent constants are also
just another kind of fixup.  The datum contains the component number and
the index into that components table of permanent-constants.  The component
number is relative to the current component, so that fasl files can be
trivially concatenated.

The header contains the sizes of the above sections, plus any entry point.
The entry point is just an offset into the data section of either a list
or a function.  If it's a function, it is funcalled.  If it is a list, it
is evaled.


make-load-form and load-time-value:

These are just handled by compiling the construction code into its own
component, and placing an annotation in the header that the results of the
entry point should be considered a permanent constant.


Debug info:

Debug info can either be just dumped as other constants, or stuck in the
header.  Given that fasl-files are always files, we could even back-patch
the headers after everything is done.  Say, have an overall header at the
beginning that contains the file-wide debug info, and then each component
contains the additional debug info needed for that component.  The
file-wide stuff would have to be constant length, so we could reserve
enough space for it.


Dumping details:

Recurse down through all the constants determining how much descriptor and
how much non-descriptor space is needed for each object.  Make an entry in
a hash table mapping from the object to the current offset, update the
current offset with the size of the object, and then recurse.

Sort the hash table entries by offset in the scavenged space, and then
iterate across all of them, filling in the scavenged space.  Then do the
same for the non-descriptor space.  While this is happening, accumulate any
fixups, which are dumped later.


If we don't want to do the two pass dumping, then circular references have
to be handled by lseeking back into the middle of the file.  Given the
rarity of circular references, this might be a better idea.


Cold loading:

Instead of using genesis to build the initial core image, write the loader
in C.  Then just load all the necessary fasls into the kernel core.  I
guess the package system would also have to be written in C, otherwise it
would be hard to intern anything before the package system existed.

Or stick with genesis.  The header is annotated with a maybe-cold-load vs
normal-load bit.  Maybe-cold-load components have to be careful to never
refer to permanent constants.


Mumblage:

The main coding difference is where knowledge about the memory object
format is represented.  Specifically, the information is moved from Genesis
to the dumper.

As for efficiency, is seems that dumping shouldn't change that much, but
loading has the potential for being much faster.  Also, it might be denser
[depends on the relative overheads of the fops vs the padding to dual-word
alignments.]

Also, the potential for ld style linking is much greater.  It would be
possible to annotate each component with the function names it defines, and
to find the function names it references by looking at the fixups list.
From this, and permanent constant definitions/references, you could build a
dependency dag for a set of components.  From this dag, you could copy the
necessary components and ignore the others.