Main Page   Namespace List   Class Hierarchy   Compound List   File List   Namespace Members   Compound Members   File Members   Related Pages  

Query Model Generation Application

This application (GenerateQueryModel.cpp) computes an expanded query model based on feedback documents and the original query model for the KL-divergence retrieval method. It can be regarded as performing a feedback in the language modeling approach to retrieval.

Parameters:

  1. index: The complete name of the index table-of-content file for the database index.

  2. smoothSupportFile: The name of the smoothing support file (e.g., one generated by GenerateSmoothSupport).

  3. textQuerySet: the original query text stream

  4. resultFile: the result file to be used for feedback

  5. TRECResultFormat: whether the result format is of the TREC format (i.e., six-column) or just a simple three-column format <queryID, docID, score>. Integer value, zero for non-TREC format, and non-zero for TREC format. Default: 1 (i.e., TREC format)

  6. expandedQuery: the file to store the expanded query model

  7. feedbackDocCount: the number of docs to use for pseudo-feedback (0 means no-feedback)

  8. queryUpdateMethod: feedback method (0, 1, 2 for mixture model, divergence minimization, and Markov chain respectively).

  9. Method-specific feedback parameters:

    For all interpolation-based approaches (i.e., the new query model is an interpolation of the original model with a (feedback) model computed based on the feedback documents), the following four parameters apply:

    1. feedbackCoefficient: the coefficient of the feedback model for interpolation. The value is in [0,1], with 0 meaning using only the original model (thus no updating/feedback) and 1 meaning using only the feedback model (thus ignoring the original model).

    2. feedbackTermCount: Truncate the feedback model to no more than a given number of words/terms.

    3. feedbackProbThresh: Truncate the feedback model to include only words with a probability higher than this threshold. Default value: 0.001.

    4. feedbackProbSumThresh: Truncate the feedback model until the sum of the probability of the included words reaches this threshold. Default value: 1.

    Parameters feedbackTermCount, feedbackProbThresh, and feedbackProbSumThresh work conjunctively to control the truncation, i.e., the truncated model must satisfy all the three constraints.

    All the three feedback methods also recognize the parameter feedbackMixtureNoise (default value :0.5), but with <font color=red> different interpretations</font>.

    In addition, the collection mixture model also recognizes the parameter emIterations, which is the maximum number of iterations the EM algorithm will run. Default: 50. (The EM algorithm can terminate earlier if the log-likelihood converges quickly, where convergence is measured by some hard-coded criterion. See the source code in SimpleKLRetMethod.cpp for details. )


Generated at Fri Jul 26 18:25:57 2002 for LEMUR by doxygen1.2.4 written by Dimitri van Heesch, © 1997-2000