Lemur Modules and Applications (Version 3.0)


  1. Parsing and Pre-processing

    ParseToFile Parses documents compatible with Parser objects and writes output compatible with BasicDocStream
    ParseQuery Takes a document in NIST's Web or Trec formats and creates queries
    ParseInQueryOp Parses a file containing structured queries into BasicDocStream format


  2. Building/Adding to an index

    BuildInvertedIndex Builds an inverted index and document index, with or without positions
    BuildKeyfileIncIndex Builds or adds to a keyfile inverted index with positions
    BuildDocMgr Builds an inverted index and document index, with or without positions
    BuildPropIndex Builds a positional index that can associate properties with terms, such as part of speech and named entity tags
    PassageIndexer Builds a positional passage index that segments documents into passage sizes
    IncIndexer Adds documents into an existing InvFPIndex, or creates a new one
    IncPassageIndexer Adds passages into an existing passage index


  3. General Retrieval and Evaluation

    RetEval Runs retrieval experiments (with/without feedback) to evaluate different retrieval models, such as simple TFIDF, Okapi, and KL-divergence
    RelFBEval Runs retrieval experiments with relevance feedback
    QueryModelEval Loads an expanded query model (e.g., one computed by GenerateQueryModel), and evaluates it with the KL-divergence retrieval model
    TwoStageRetEval Runs retrieval experiments, using the two-stage smoothing method for the initial retrieval and the KL-divergence model for feedback
    GenL2Norm Generates a support file for retrieval using cosine similarity
    QueryClarity Computes clarity scores for a query model
    GenerateSmoothSupport Generates two support files for retrieval using the language modeling approach to speed up the retrieval process
    GenerateQueryModel Computes an expanded query model based on feedback documents and the original query model for the KL-divergence retrieval method
    EstimateDirPrior Uses the leave-one-out method to estimate an optimal setting for the Dirichlet prior smoothing parameter
    ireval.pl A Perl script that does TREC-style retrieval evaluation


  4. User Interfaces

    Lemur CGI Cgi code for using Lemur indexes on the web
    Retrieval GUI GUI written in java/swing for searching Lemur indexes


  5. Distributed IR and Query-based Sampling

    CollSelIndex Builds a collection selection database using either document frequency or collection term frequency for the database's term frequency count
    DistRetEval Does distributed retrieval, using a resource selection index and individual indexes
    QryBasedSample Performs query-based sampling on text databases


  6. Structured Query Language

    ParseInQueryOp Parses a file containing structured queries into BasicDocStream format
    StructQueryEval Runs retrieval experiments to evaluate the performance of the structured query model using the inquery retrieval method


  7. Summarization

    BasicSummApp Demonstrates a simple summarizer
    MMRSummApp A more complex summarizer which does comparisons between passages


  8. Document Clustering

    Cluster Performs the basic online clustering task over documents in an index. Can be used for TDT topic detection.
    OfflineCluster Demonstrates the basic offline clustering task. Provides k-means and bisecting k-means partitional clustering.
    PLSA Perform Probabilistic Latent Semantic Analysis (PLSA) on a collection, building three probability tables.

The Lemur Project
Last modified: Wed Jul 7 14:45:13 EDT 2004