Structured Query Retrieval


Contents

  1. Overview
  2. Applications
  3. Structured Query Language

1. Overview

The Lemur structured query language is a reimplementation of the InQuery (developed at the CIIR) structured query language. Among other things, this query language enables the use of proximity operators (ordered and unordered windows) in queries. Feedback is implemented as a WSUM of the original query combined with terms selected using the Rocchio implementation of the Lemur TFIDF retrieval method. The expanded query has the form:

#wsum( (1-a) <original query>
      a*w1  t1
      a*w2  t2
      ...
      a*wN  tN
      )


where a is the value of the parameter feedbackPosCoeff.

2. Applications

ParseInQuery - Parses structured queries into basic document format.

StructQueryEval - Retrieval evaluation for structured queries.

3. Structured Query Language

The structured query operators are:

   Sum Operator:   #sum (T1 ... Tn )

     The terms or nodes contained in the sum operator are treated as
     having equal influence on the final result.  The belief values
     provided by the arguments of the sum are averaged to produce the
     belief value of the #sum node.

   Weighted Sum Operator:  #wsum (W1 T1 ... Wn Tn)

     The terms or nodes contained in the wsum operator contribute
     unequally to the final result according to the weight associated
     with each (Wx).  Note that this is a change from the InQuery
     operator, as there is no initial weight, Ws, for scaling the belief
     value of the sum.

   Ordered Distance Operator:  #N (T1 ... Tn)  or #odN (T1 ... Tn)

     The terms within an ODN operator must be found within N words of
     each other in the text in order to contribute to the document's
     belief value.  The "#N" version is an abbreviation of #ODN, thus
     #3(health care) is equivalent to #od3(health care).

   Un-ordered Window Operator:  #uwN(T1 ... Tn)

     The terms contained in a UWN operator must be found in any order
     within a window of N words in order for this operator to contribute
     to the belief value of the document.

   Phrase Operator:  #phrase(T1 ... Tn)

     The operator is treated as an ordered distance operator of 3
     (#od3). Note that this is a simplification of the more complicated
     heuristic used by InQuery.

   Passage Operator:  #passageN(T1 ... Tn)

     The passage operator looks for the terms or nodes within the
     operator to be found in a passage window of N words.  The document
     is rated based upon the score of it's best passage.

   Synonym Operator:  #syn(T1 ... Tn)

     The terms of the operator are treated as instances of the same
     term.

   And Operator:  #and(T1 ... Tn)

     The more terms contained in the AND operator which are found in a
     document, the higher the belief value of that document.

   Boolean And Operator:  #band(T1 ... Tn)

     All of the terms within a BAND operator must be found in a document
     in order for this operator to contribute to the belief value of
     that document.

   Boolean And Not Operator:  #bandnot (arg1 arg2)

     Search for document matching the first argument but not the second.
     
   Or Operator:  #or(T1 ... Tn)

     One of terms within the OR operator must be found in a document for
     that document to get credit for this operator.


   Maximum Operator:  #max(T1 ... Tn)

     The maximum belief value of all the terms or nodes contained in the
     MAX operator is taken to be the belief value of this operator.

   Filter Require Operator: #filreq(arg1 arg2)

     Use the documents returned (belief list) of the first argument if
     and only if the second argument would return documents.  The value
     of the second argument does not effect the belief values of the
     first argument; only whether they will be returned or not.

   Filter Reject Operator: #filrej(arg1 arg2)

     Use the documents returned by the first argument if and only if
     there were no documents returned by the second argument.  The value
     of the second argument does not effect the belief values of the
     first argument; only whether they will be returned or not.

   Negation Operator:  #not(arg1)

     The term or node contained in this operator is negated so that
     documents which do not contain it are rewarded.  

   Property Operator:  #prop(arg1 arg2)

     Return documents where arg1 is a property of arg2.

The input query file is of the form:

#qN = queryNode ;
where N is the query id and queryNode is one of the aforementioned query operators. The query may span multiple lines and must be terminated with the semicolon. The body of the query must not contain a semicolon, as that will prematurely terminate the query.

An example query:

#q18=#wsum(1 #sum(Languages and compilers for #1(parallel processors))
 2 #sum(highly horizontal microcoded machines)
 1 code 1 compaction
);