Main Page   Namespace List   Class Hierarchy   Compound List   File List   Namespace Members   Compound Members   File Members   Related Pages  

Relevance Feedback Evaluation Application

This application (RelFBEval.cpp) runs retrieval experiments with relevance feedback. Different retrieval models can be used with different settings for the corresponding parameters. Although this program is designed for relevance feedback, it can be easily used for pseudo feedback -- you just need to set the parameter feedbackDocuments to a result file, i.e., interpreting a result file as if all the entries represent relevant documents.

Two important notes:

Scoring is either done over a working set of documents (essentially re-ranking), or over the whole collection. This is indicated by the parameter "useWorkingSet". When "useWorkingSet" has a non-zero (integer) value, scoring will be on a working set specified in a file given by "workSetFile". The file should have three columns. The first is the query id; the second the document id; and the last a numerical value, which is ignored. The reason for having a third column of numerical values is so that any retrieval result of the simple format (i.e., non-trec format) generated by Lemur could be directly used as a "workSetFile" for the purpose of re-ranking, which is convenient. Also, the third column could be used to provide a prior probability value for each document, which could be useful for some algorithms. By default, scoring is on the whole collection.

It currently supports three different models:

  1. The popular TFIDF retrieval model

  2. The Okapi BM25 retrieval function

  3. The KL-divergence language model based retrieval method

The parameter to select the model is retModel (with value 0 for TFIDF, 1 for Okapi, and 2 for KL). It is suspected that there is a bug in the implementation of the feedback for Okapi BM25 retrieval function, because the performance is not as expected.

Other common parameters (for all retrieval methods) are:

  1. index: The complete name of the index table-of-content file for the database index.

  2. textQuerySet: the query text stream

  3. resultFile: the result file

  4. resultCount: the number of documents to return as result for each query

  5. feedbackDocuments : the file of feedback documents to be used for feedback. In the case of pseudo feedback, this can be a result file generated from an initial retrieval process. In the case of relevance feedback, this is usually a 3-column relevance judgment file. Note that this means you can NOT use a TREC-style judgment file directly; you must remove the second column to convert it to three-column.

  6. feedbackDocCount: the number of docs to use for feedback (negative value means using all judged documents for feedback). The documents in the feedbackDocuments are sorted in decreasing order according to the numerical value in the third column, and then the top documents are used for feedback.

  7. feedbackTermCount: the number of terms to add to a query when doing feedback. Note that in the KL-div. approach, the actual number of terms is also affected by two other parameters.(See below.)

Model-specific parameters are:


Generated at Fri Jul 26 18:23:05 2002 for LEMUR by doxygen1.2.4 written by Dimitri van Heesch, © 1997-2000