This application (QueryClarity.cpp) computes clarity scores for an expanded query model based on pseudo-feedback documents. Performs the retrieval of those documents using the relevant SimpleKLRetMethod parameters. Clarity scores for each entire query, and each individual term within each query are written to the file specified by the parameter "expandedQuery".
Parameters:
index: The complete name of the index table-of-content file for the database index.
smoothSupportFile: The name of the smoothing support file (e.g., one generated by GenerateSmoothSupport).
textQuery: the original query text stream
expandedQuery: the file to store the query clarity scores.
feedbackDocCount: the number of docs to use for  pseudo-feedback. If not specified or 0, the value defaults to 500.
queryUpdateMethod: feedback method, one of: mixture or mix or 0 for mixture. divmin or div or 1 for div min markovchain or mc or 2 for markov chain relevancemodel1 or rm1 or 3 for relevance model 1. relevancemodel2 or rm2 or 4 for relevance model 2. For all interpolation-based approaches (i.e., the new query model is an interpolation of the original model with a (feedback) model computed based on the feedback documents), the following four parameters apply:
feedbackCoefficient: the coefficient of the feedback model  for interpolation. The value is in [0,1], with 0 meaning using only the  original model (thus no updating/feedback) and 1 meaning using only the  feedback model (thus ignoring the original model).
feedbackTermCount: Truncate the feedback model to no more  than a given number of words/terms.
feedbackProbThresh: Truncate the feedback model to include  only words with a probability higher than this threshold.  Default value: 0.001.
feedbackProbSumThresh: Truncate the feedback model until  the sum of the probability of the included words reaches this threshold.  Default value: 1. 
feedbackTermCount, feedbackProbThresh, and  feedbackProbSumThresh work conjunctively to control the truncation,  i.e., the truncated model must satisfy all the three constraints.  
 All the three feedback methods also recognize the parameter  feedbackMixtureNoise (default value :0.5), but with  <font color=red>  different interpretations</font>.  
feedbackMixtureNoise  is the collection model selection probability in the mixture model. That is, with this probability, a word is picked according to the collection language model, when a feedback document is "generated". feedbackMixtureNoise  means the weight of the divergence from the collection language model.  (The higher it is, the farther the estimated model is from the collection  model.) feedbackMixtureNoise is the  probability of not stopping, i.e., 1- alpha, where  alpha is the stopping probability while walking through the chain.  
 In addition, the collection mixture model also recognizes the parameter  emIterations, which is the maximum number of iterations the EM  algorithm will run. Default: 50. (The EM algorithm can terminate earlier  if the log-likelihood converges quickly, where convergence is measured  by some hard-coded criterion. See the source code in  SimpleKLRetMethod.cpp for details. ) 
 1.2.18
1.2.18