Lemur Modules and Applications (Version 3.0)

Parsing and Pre-processing

ParseToFile Parses documents compatible with Parser objects and writes output compatible with BasicDocStream

ParseQuery Takes a document in NIST's Web or Trec formats and creates queries

ParseInQueryOp Parses a file containing structured queries into BasicDocStream format

Building/Adding to an index

BuildInvertedIndex Builds an inverted index and document index, with or without positions

BuildKeyfileIncIndex Builds or adds to a keyfile inverted index with positions
BuildDocMgr Builds an inverted index and document index, with or without positions

BuildPropIndex Builds a positional index that can associate properties with terms, such as part of speech and named entity tags

PassageIndexer Builds a positional passage index that segments documents into passage sizes

IncIndexer Adds documents into an existing InvFPIndex, or creates a new one

IncPassageIndexer Adds passages into an existing passage index

General Retrieval and Evaluation

RetEval Runs retrieval experiments (with/without feedback) to evaluate different retrieval models, such as simple TFIDF, Okapi, and KL-divergence

RelFBEval Runs retrieval experiments with relevance feedback

QueryModelEval Loads an expanded query model (e.g., one computed by GenerateQueryModel), and evaluates it with the KL-divergence retrieval model

TwoStageRetEval Runs retrieval experiments, using the two-stage smoothing method for the initial retrieval and the KL-divergence model for feedback

GenL2Norm Generates a support file for retrieval using cosine similarity

QueryClarity Computes clarity scores for a query model

GenerateSmoothSupport Generates two support files for retrieval using the language modeling approach to speed up the retrieval process

GenerateQueryModel Computes an expanded query model based on feedback documents and the original query model for the KL-divergence retrieval method

EstimateDirPrior Uses the leave-one-out method to estimate an optimal setting for the Dirichlet prior smoothing parameter

ireval.pl A Perl script that does TREC-style retrieval evaluation

User Interfaces

Lemur CGI Cgi code for using Lemur indexes on the web

Retrieval GUI GUI written in java/swing for searching Lemur indexes

Distributed IR and Query-based Sampling

CollSelIndex Builds a collection selection database using either document frequency or collection term frequency for the database's term frequency count

DistRetEval Does distributed retrieval, using a resource selection index and individual indexes

QryBasedSample Performs query-based sampling on text databases

Structured Query Language

ParseInQueryOp Parses a file containing structured queries into BasicDocStream format

StructQueryEval Runs retrieval experiments to evaluate the performance of the structured query model using the inquery retrieval method

Summarization

BasicSummApp Demonstrates a simple summarizer

MMRSummApp A more complex summarizer which does comparisons between passages

Document Clustering

Cluster Performs the basic online clustering task over documents in an index. Can be used for TDT topic detection.

OfflineCluster Demonstrates the basic offline clustering task. Provides k-means and bisecting k-means partitional clustering.

PLSA Perform Probabilistic Latent Semantic Analysis (PLSA) on a collection, building three probability tables.

The Lemur Project
Last modified: Wed Jul 7 14:45:13 EDT 2004

ParseToFile	Parses documents compatible with `Parser` objects and writes output compatible with `BasicDocStream`
ParseQuery	Takes a document in NIST's Web or Trec formats and creates queries
ParseInQueryOp	Parses a file containing structured queries into BasicDocStream format

BuildInvertedIndex	Builds an inverted index and document index, with or without positions
BuildKeyfileIncIndex	Builds or adds to a keyfile inverted index with positions
BuildDocMgr	Builds an inverted index and document index, with or without positions
BuildPropIndex	Builds a positional index that can associate properties with terms, such as part of speech and named entity tags
PassageIndexer	Builds a positional passage index that segments documents into passage sizes
IncIndexer	Adds documents into an existing `InvFPIndex`, or creates a new one
IncPassageIndexer	Adds passages into an existing passage index

RetEval	Runs retrieval experiments (with/without feedback) to evaluate different retrieval models, such as simple TFIDF, Okapi, and KL-divergence
RelFBEval	Runs retrieval experiments with relevance feedback
QueryModelEval	Loads an expanded query model (e.g., one computed by `GenerateQueryModel`), and evaluates it with the KL-divergence retrieval model
TwoStageRetEval	Runs retrieval experiments, using the two-stage smoothing method for the initial retrieval and the KL-divergence model for feedback
GenL2Norm	Generates a support file for retrieval using cosine similarity
QueryClarity	Computes clarity scores for a query model
GenerateSmoothSupport	Generates two support files for retrieval using the language modeling approach to speed up the retrieval process
GenerateQueryModel	Computes an expanded query model based on feedback documents and the original query model for the KL-divergence retrieval method
EstimateDirPrior	Uses the leave-one-out method to estimate an optimal setting for the Dirichlet prior smoothing parameter
ireval.pl	A Perl script that does TREC-style retrieval evaluation

Lemur CGI	Cgi code for using Lemur indexes on the web
Retrieval GUI	GUI written in java/swing for searching Lemur indexes

CollSelIndex	Builds a collection selection database using either document frequency or collection term frequency for the database's term frequency count
DistRetEval	Does distributed retrieval, using a resource selection index and individual indexes
QryBasedSample	Performs query-based sampling on text databases

ParseInQueryOp	Parses a file containing structured queries into BasicDocStream format
StructQueryEval	Runs retrieval experiments to evaluate the performance of the structured query model using the inquery retrieval method

BasicSummApp	Demonstrates a simple summarizer
MMRSummApp	A more complex summarizer which does comparisons between passages

Cluster	Performs the basic online clustering task over documents in an index. Can be used for TDT topic detection.
OfflineCluster	Demonstrates the basic offline clustering task. Provides k-means and bisecting k-means partitional clustering.
PLSA	Perform Probabilistic Latent Semantic Analysis (PLSA) on a collection, building three probability tables.