Program Structure
=================

Goals:
1. fast
2. make the index quite small -- at most 70% of the size of the text
3. allow the text to be stored externally, compressed or not
4. cope with non-ascii (e.g. word processor) data files
5. ability to add files to the index at any time, without having to have
   all of the indexed files on-line or to rebuild the entire index.

Methods:

LQ-Text keeps several lists:
* a list of all of the words that have been seen.  There's no point
  listing words like `the' or `and', or very short words like `lq', of
  course.
  A word is a letter followed by letters, digits or (in some cases)
  underscores or colons.
  This definition is designed to allow LQ-Text to index programming
  languages as well as WP files, to help with mantaining source trees.

  Current format: dbm, so that only 2 disk accesses are generally
  needed to retrieve a given word.

  Stores word number

* a list of all the words that have been seen
  for each word, a list of all the files containing the word, and the
  relevant offset into the file.
  The numbers are stored in a compressed format, to conserve space.
  There is an overflow file, "data".

* a list of all the files that have been seen, together with the date on
  which they were last indexed.

  Current format: dbm, so that only 2 disk accesses are generally
  needed to retrieve a given filename.

  Stores (File, File-Number ["FID"], date)
