Main Page   Namespace List   Class Hierarchy   Alphabetical List   Compound List   File List   Namespace Members   Compound Members   File Members   Related Pages  

ParseInQueryOp

This application ( ParseInquery.cpp ) parses a file containing structured queries into BasicDocStream format.

The parameters are:

  1. stopwords: name of file containing the stopword list.
  2. acronyms: name of file containing the acronym list.
  3. docFormat:
  4. stemmer:
  5. outputFile: name of the output file.

The structured query operators are:

   Sum Operator:   sum (T1 ... Tn )

     The terms or nodes contained in the sum operator are treated as
     having equal influence on the final result.  The belief values
     provided by the arguments of the sum are averaged to produce the
     belief value of the sum node.

   Weighted Sum Operator:  wsum (W1 T1 ... Wn Tn)

     The terms or nodes contained in the wsum operator contribute
     unequally to the final result according to the weight associated
     with each (Wx).  Note that this is a change from the InQuery
     operator, as there is no initial weight, Ws, for scaling the belief
     value of the sum.

   Ordered Distance Operator:  N (T1 ... Tn)  or odN (T1 ... Tn)

     The terms within an ODN operator must be found within N words of
     each other in the text in order to contribute to the document's
     belief value.  The "N" version is an abbreviation of ODN, thus
     #3(health care) is equivalent to od3(health care).

   Un-ordered Window Operator:  uwN(T1 ... Tn)

     The terms contained in a UWN operator must be found in any order
     within a window of N words in order for this operator to contribute
     to the belief value of the document.

   Phrase Operator:  phrase(T1 ... Tn)

     The operator is treated as an ordered distance operator of 3
     (od3).

   Passage Operator:  passageN(T1 ... Tn)

     The passage operator looks for the terms or nodes within the
     operator to be found in a passage window of N words.  The document
     is rated based upon the score of it's best passage.

   Synonym Operator:  syn(T1 ... Tn)

     The terms of the operator are treated as instances of the same
     term.

   And Operator:  and(T1 ... Tn)

     The more terms contained in the AND operator which are found in a
     document, the higher the belief value of that document.

   Boolean And Operator:  band(T1 ... Tn)

     All of the terms within a BAND operator must be found in a document
     in order for this operator to contribute to the belief value of
     that document.

   Boolean And Not Operator:  bandnot (T N)

     Search for document matching the first argument but not the second.
     
   Or Operator:  or(T1 ... Tn)

     One of terms within the OR operator must be found in a document for
     that document to get credit for this operator.


   Maximum Operator:  max(T1 ... Tn)

     The maximum belief value of all the terms or nodes contained in the
     MAX operator is taken to be the belief value of this operator.

   Filter Require Operator: filreq(arg1 arg2)

     Use the documents returned (belief list) of the first argument if
     and only if the second argument would return documents.  The value
     of the second argument does not effect the belief values of the
     first argument; only whether they will be returned or not.

   Filter Reject Operator: filrej(arg1 arg2)

     Use the documents returned by the first argument if and only if
     there were no documents returned by the second argument.  The value
     of the second argument does not effect the belief values of the
     first argument; only whether they will be returned or not.

   Negation Operator:  not(T1)

     The term or node contained in this operator is negated so that
     documents which do not contain it are rewarded. 

The input query file is of the form:

qN = queryNode ;
where N is the query id and queryNode is one of the aforementioned query operators. The query may span multiple lines and must be terminated with the semicolon. The body of the query must not contain a semicolon, as that will prematurely terminate the query.

An example query:

q18=wsum(1 sum(Languages and compilers for #1(parallel processors))
 2 sum(highly horizontal microcoded machines)
 1 code 1 compaction
);


Generated on Wed Nov 3 13:00:03 2004 for Lemur Toolkit by doxygen1.2.18