Lemur 2.0 release notes
- New applications:
- Other major additions:
applications:
- Added the ability to use a working set (as was in RetEval) to QueryModelEval.
- Added Arabic and Chinese language support to all parsing and indexing applications. See utility section below.
distrib:
- Added SingleRegrMergeMethod, the regression merge method for distributed results merging. This merge method requires a single model for all individual databases.
retrieval:
- Added InQuery style structured query language, with use of proximity operators.
- All TextQueryRetrieval methods now accept the parameter
cacheDocReps = {0,1}, with default value 1 to cache
document reps across calls to score collection. This option trades space for speed, using roughly the size of the collection times the size of a document rep in space.
- Added CosSimRetMethod. Cosine similarity retrieval
method.
- Added InQueryRetMethod. InQuery structured query
language retrieval method.
- Added Lavrenko's relevance models as query update methods for the SimpleKLRetMethod.
- Modified SimpleKLRetMethod to return true KL values. Now returns negative values.
index:
- Added passage indexing of documents, with overlapping passages.
- Added incremental indexing for InvPushIndex, InvFPPushIndex,and PassageIndexer. New documents can be added to an existing index using the IncIndexer application.
- Reimplemented InvFPTermList::countTerms to improve speed
utility:
- Changed Stemmer::stemWord to return char *.
- Added the Krovetz stemmer.
- Added the Larkey Arabic stemmer for text encoded in the
Windows CP1256 encoding.
- Added support for parsing and indexing Arabic documents in the Windows CP1256 encoding.
- Added support for parsing and indexing Chinese documents in the GB2312 encoding. Includes both single character and segmented words support.
- Bugs Fixed:
- Problem: MMRSummApp fails to check for query-based sampling parameter
Solution: Add check for parameter
- Problem: InvIndex::fullToc performs one extra
iteration at end of file.
Solution: Add an explicit test for the number of
elements read to exit the loop
- Problem: All flex generated parsers (WebParser,
TrecParser, ...) exercise an optimization bug in GCC
3.2 with respect to order of execution.
Solution: Rewrite the body of a for loop to ensure order of execution is preserved.
- Problem: param_init in parameters.c leaks memory.
Solution: Remove the leaking allocation.
- Problem: InvFPTermList::termInfoListSeq leaks a dynamically allocated vector
Solution: Replace with a statically allocated vector.
- Problem: InvPushIndex leaks memory.
Solution: free allocated memory in destructor.
Last modified: Mon Sep 30 16:13:38 EDT 2002