Main Page Namespace List Class Hierarchy Compound List File List Namespace Members Compound Members File Members Related Pages
Keyfile positional incremental Indexer
This application builds a Keyfile positional index for a collection of documents.
To use it, follow the general steps of running a lemur application.
The parameters are:
-
index
: name of the index table-of-content file without the .ifp extension. -
memory
: memory (in bytes) for index cache (def = 96000000). -
stopwords
: name of file containing the stopword list. -
acronyms
: name of file containing the acronym list. -
countStopWords
: If true, count stopwords in document length. -
docFormat
:
- "trec" for standard TREC formatted documents
- "web" for web TREC formatted documents
- "chinese" for segmented Chinese text (TREC format, GB encoding)
- "chinesechar" for unsegmented Chinese text (TREC format, GB encoding)
- "arabic" for Arabic text (TREC format, Windows CP1256 encoding)
-
stemmer
:
- "porter" Porter stemmer.
- "krovetz" Krovetz stemmer, requires additional parameters
-
KstemmerDir
: Path to directory of data files used by Krovetz's stemmer.
- "arabic" arabic stemmer, requires additional parameters
-
arabicStemDir
: Path to directory of data files used by the Arabic stemmers. -
arabicStemFunc
: Which stemming algorithm to apply, one of:
- arabic_stop : arabic_stop
- arabic_norm2 : table normalization
- arabic_norm2_stop : table normalization with stopping
- arabic_light10 : light9 plus ll prefix
- arabic_light10_stop : light10 and remove stop words
-
dataFiles
: name of file containing list of datafiles to index.
Generated on Fri Feb 6 07:12:10 2004 for LEMUR by
1.2.16