InvFP Indexer

This application builds an InvFP index for a collection of documents with properties associated with terms.

To use it, follow the general steps of running a lemur application.

The parameters are:

index: name of the index to create (don't include extension)
memory: memory (in bytes) of InvFPPushIndex cache (def = 96000000).
stopwords: name of file containing the stopword list.
acronyms: name of file containing the acronym list.
countStopWords: If true, count stopwords in document length.
docFormat:
- "brill" for documents with Brill's part of speech tags, still needs DOC separators between documents similar to Lemur's WebParser. This is the default.
- "identifinder" for documents with Identifinder's named entity tags, still needs DOC separators between documents similar to Lemur's WebParser.
stemmer:
- "porter" Porter stemmer.
- "krovetz" Krovetz stemmer, requires additional parameters
  1. KstemmerDir: Path to directory of data files used by Krovetz's stemmer.
- "arabic" arabic stemmer, requires additional parameters
  1. arabicStemDir: Path to directory of data files used by the Arabic stemmers.
  2. arabicStemFunc: Which stemming algorithm to apply, one of:
    - arabic_stop : arabic_stop
    - arabic_norm2 : table normalization
    - arabic_norm2_stop : table normalization with stopping
    - arabic_light10 : light9 plus ll prefix
    - arabic_light10_stop : light10 and remove stop words
dataFiles: name of file containing list of datafiles to index.

Generated on Tue Nov 25 11:27:25 2003 for Lemur Toolkit by

1.2.18