This application (BuildBasicIndex.cpp) builds a BasicIndex.
This application builds a basic index for a collection of documents.
To use it, follow the general steps of running a lemur application and set the following variables in the parameter file:
inputFile
: the path to the source file. outputPrefix
: a prefix name for your index.maxDocuments
: maximum number of documents to index (default: 1000000) maxMemory
: maximum amount of memory to use for indexing (default:0x8000000, or 128MB)
In general, the outputPrefix
should be an absolute path, unless you always open the index from the same directory as where the index is.
A "table-of-content" (TOC) file with a name of the format outputPrefix.bsc will be written in the directory where the index is stored.
The following is an example of use:
% cat buildparam inputFile = /usr0/mydata/source; outputPrefix = /usr0/mydata/index; maxDocuments = 200000; maxMemory = 0x10000000; % BuildBasicIndex buildparam The TOC file is /usr0/mydata/index.bsc.
See also the testing scripts in test_basic_index.sh
and the parameter file build_param
in the direcotry data/basicparam
.