Main Page   Namespace List   Class Hierarchy   Compound List   File List   Namespace Members   Compound Members   File Members   Related Pages  

Retrieval Evaluation Application

This application (RetEval.cpp) runs retrieval experiments (with/without feedback) to evaluate different retrieval models as well as different parameter settings for those models.

Scoring is either done over a working set of documents (essentially re-ranking), or over the whole collection. This is indicated by the parameter "useWorkingSet". When "useWorkingSet" has a non-zero (integer) value, scoring will be on a working set specified in a file given by "workSetFile". The file should have three columns. The first is the query id; the second the document id; and the last a numerical value, which is ignored. The reason for having a third column of numerical values is so that any retrieval result of the simple format (i.e., non-trec format) generated by Lemur could be directly used as a "workSetFile" for the purpose of re-ranking, which is convenient. Also, the third column could be used to provide a prior probability value for each document, which could be useful for some algorithms. By default, scoring is on the whole collection.

It currently supports three different models:

  1. The popular TFIDF retrieval model

  2. The Okapi BM25 retrieval function

  3. The KL-divergence language model based retrieval method

The parameter to select the model is retModel (with value 0 for TFIDF, 1 for Okapi, and 2 for KL). It is suspected that there is a bug in the implementation of the feedback for Okapi BM25 retrieval function, because the performance is not as expected.

Other common parameters (for all retrieval methods) are:

  1. index: The complete name of the index table-of-content file for the database index.

  2. textQuerySet: the query text stream

  3. resultFile: the result file

  4. resultCount: the number of documents to return as result for each query

  5. feedbackDocCount: the number of docs to use for pseudo-feedback (0 means no-feedback)

  6. feedbackTermCount: the number of terms to add to a query when doing feedback. Note that in the KL-div. approach, the actual number of terms is also affected by two other parameters.(See below.)

Model-specific parameters are:


Generated at Fri Jul 26 18:23:05 2002 for LEMUR by doxygen1.2.4 written by Dimitri van Heesch, © 1997-2000