Main Page   Namespace List   Class Hierarchy   Compound List   File List   Namespace Members   Compound Members   File Members   Related Pages  

BasicIndex Class Reference

Basic Indexer (with arbitrary compressor). More...

#include <BasicIndex.hpp>

Inheritance diagram for BasicIndex:

Index List of all members.

Public Methods

 BasicIndex ()
 constructor (used when opening an index).

 BasicIndex (Compress *pc)
 constructor (used when building an index).

virtual ~BasicIndex ()
virtual bool open (const char *indexName)
 Open previously created Index, return true if opened successfully.

void build (DocStream *collectionStream,const char *file,const char *outputPrefix,int totalDocs=0x1000000,int maxMemory=0x4000000,int minimumCount=1,int maxVocSize=2000000)
Spelling and index conversion
virtual int term (const char *word)
 Convert a term spelling to a termID.

virtual const char* term (int termID)
 Convert a termID to its spelling.

virtual int document (const char *docIDStr)
 Convert a spelling to docID.

virtual const char* document (int docID)
 Convert a docID to its spelling.

virtual const char* termLexiconID ()
 return the term lexicon ID.

Summary counts
virtual int docCount ()
 Total count (i.e., number) of documents in collection.

virtual int termCountUnique ()
 Total count of unique terms in collection.

virtual int termCount (int termID)const
 Total counts of a term in collection.

virtual int termCount ()const
 Total counts of all terms in collection.

virtual float docLengthAvg ()
 Average document length.

virtual int docCount (int termID)
 Total counts of doc with a given term.

virtual int docLength (int docID)const
 Total counts of terms in a document.

Index entry access
virtual DocInfoListdocInfoList (int termID)
 doc entries in a term index, caller should release the memory
See also:
DocList.


virtual TermInfoListtermInfoList (int docID)
 word entries in a document index, caller should release the memory
See also:
TermList.



Detailed Description

Basic Indexer (with arbitrary compressor).

BasicIndex is a basic implementation of Index. It creates and manages two indices (term->doc and doc->term) as well as a term lexicon and document id lexicon. The application can pass in any compressor when calling the build function. @See Index for an example of use.


Constructor & Destructor Documentation

BasicIndex::BasicIndex ( )
 

constructor (used when opening an index).

BasicIndex::BasicIndex ( Compress * pc )
 

constructor (used when building an index).

BasicIndex::~BasicIndex ( ) [virtual]
 


Member Function Documentation

void BasicIndex::build ( DocStream * collectionStream,
const char * file,
const char * outputPrefix,
int totalDocs = 0x1000000,
int maxMemory = 0x4000000,
int minimumCount = 1,
int maxVocSize = 2000000 )
 

int BasicIndex::docCount ( int t ) [virtual]
 

Total counts of doc with a given term.

Reimplemented from Index.

int BasicIndex::docCount ( ) [inline, virtual]
 

Total count (i.e., number) of documents in collection.

Reimplemented from Index.

DocInfoList * BasicIndex::docInfoList ( int termID ) [virtual]
 

doc entries in a term index, caller should release the memory

See also:
DocList.

Reimplemented from Index.

int BasicIndex::docLength ( int docID ) const [inline, virtual]
 

Total counts of terms in a document.

Reimplemented from Index.

float BasicIndex::docLengthAvg ( ) [inline, virtual]
 

Average document length.

Reimplemented from Index.

const char * BasicIndex::document ( int docID ) [inline, virtual]
 

Convert a docID to its spelling.

Reimplemented from Index.

int BasicIndex::document ( const char * docIDStr ) [inline, virtual]
 

Convert a spelling to docID.

Reimplemented from Index.

bool BasicIndex::open ( const char * fn ) [virtual]
 

Open previously created Index, return true if opened successfully.

Reimplemented from Index.

const char * BasicIndex::term ( int termID ) [inline, virtual]
 

Convert a termID to its spelling.

Reimplemented from Index.

int BasicIndex::term ( const char * word ) [inline, virtual]
 

Convert a term spelling to a termID.

Reimplemented from Index.

int BasicIndex::termCount ( ) const [inline, virtual]
 

Total counts of all terms in collection.

Reimplemented from Index.

int BasicIndex::termCount ( int termID ) const [inline, virtual]
 

Total counts of a term in collection.

Reimplemented from Index.

int BasicIndex::termCountUnique ( ) [inline, virtual]
 

Total count of unique terms in collection.

Reimplemented from Index.

TermInfoList * BasicIndex::termInfoList ( int docID ) [virtual]
 

word entries in a document index, caller should release the memory

See also:
TermList.

Reimplemented from Index.

const char * BasicIndex::termLexiconID ( ) [inline, virtual]
 

return the term lexicon ID.

Reimplemented from Index.


The documentation for this class was generated from the following files:
Generated at Fri Jul 26 18:22:42 2002 for LEMUR by doxygen1.2.4 written by Dimitri van Heesch, © 1997-2000