This application ( ParseInquery.cpp ) parses a file containing structured queries into BasicDocStream format.
The parameters are:
stopwords: name of file containing the stopword list. acronyms: name of file containing the acronym list. docFormat: stemmer: KstemmerDir: Path to directory of data files used by Krovetz's stemmer. arabicStemDir: Path to directory of data files used by the Arabic stemmers. arabicStemFunc: Which stemming algorithm to apply, one of: outputFile: name of the output file. The structured query operators are:
Sum Operator: sum (T1 ... Tn )
The terms or nodes contained in the sum operator are treated as
having equal influence on the final result. The belief values
provided by the arguments of the sum are averaged to produce the
belief value of the sum node.
Weighted Sum Operator: wsum (W1 T1 ... Wn Tn)
The terms or nodes contained in the wsum operator contribute
unequally to the final result according to the weight associated
with each (Wx). Note that this is a change from the InQuery
operator, as there is no initial weight, Ws, for scaling the belief
value of the sum.
Ordered Distance Operator: N (T1 ... Tn) or odN (T1 ... Tn)
The terms within an ODN operator must be found within N words of
each other in the text in order to contribute to the document's
belief value. The "N" version is an abbreviation of ODN, thus
#3(health care) is equivalent to od3(health care).
Un-ordered Window Operator: uwN(T1 ... Tn)
The terms contained in a UWN operator must be found in any order
within a window of N words in order for this operator to contribute
to the belief value of the document.
Phrase Operator: phrase(T1 ... Tn)
The operator is treated as an ordered distance operator of 3
(od3).
Passage Operator: passageN(T1 ... Tn)
The passage operator looks for the terms or nodes within the
operator to be found in a passage window of N words. The document
is rated based upon the score of it's best passage.
Synonym Operator: syn(T1 ... Tn)
The terms of the operator are treated as instances of the same
term.
And Operator: and(T1 ... Tn)
The more terms contained in the AND operator which are found in a
document, the higher the belief value of that document.
Boolean And Operator: band(T1 ... Tn)
All of the terms within a BAND operator must be found in a document
in order for this operator to contribute to the belief value of
that document.
Boolean And Not Operator: bandnot (T N)
Search for document matching the first argument but not the second.
Or Operator: or(T1 ... Tn)
One of terms within the OR operator must be found in a document for
that document to get credit for this operator.
Maximum Operator: max(T1 ... Tn)
The maximum belief value of all the terms or nodes contained in the
MAX operator is taken to be the belief value of this operator.
Filter Require Operator: filreq(arg1 arg2)
Use the documents returned (belief list) of the first argument if
and only if the second argument would return documents. The value
of the second argument does not effect the belief values of the
first argument; only whether they will be returned or not.
Filter Reject Operator: filrej(arg1 arg2)
Use the documents returned by the first argument if and only if
there were no documents returned by the second argument. The value
of the second argument does not effect the belief values of the
first argument; only whether they will be returned or not.
Negation Operator: not(T1)
The term or node contained in this operator is negated so that
documents which do not contain it are rewarded.
The input query file is of the form:
qN = queryNode ;where N is the query id and queryNode is one of the aforementioned query operators. The query may span multiple lines and must be terminated with the semicolon. The body of the query must not contain a semicolon, as that will prematurely terminate the query.
An example query:
q18=wsum(1 sum(Languages and compilers for #1(parallel processors)) 2 sum(highly horizontal microcoded machines) 1 code 1 compaction );
1.2.16