Main Page   Namespace List   Class Hierarchy   Compound List   File List   Namespace Members   Compound Members   File Members   Related Pages  

InvPushIndex Class Reference

#include <InvPushIndex.hpp>

Inheritance diagram for InvPushIndex:

PushIndex InvFPPushIndex List of all members.

Public Methods

 InvPushIndex (char *prefix="DefaultIndex",int cachesize=128000000,long maxfilesize=2100000000,DOCID_T startdocid=1)
 ~InvPushIndex ()
void setName (char *prefix)
 sets the name for this index. the name will be the prefix for all files related to this index.

bool beginDoc (DocumentProps *dp)
 the beginning of a new document, returns true if initiation was successful.

bool addTerm (Term &t)
 adding a term to the current document, returns true if term was added successfully.

void endDoc (DocumentProps *dp)
 signify the end of current document.

virtual void endDoc (DocumentProps *dp,const char *mgr)
 signify the end of current document and associate with certain document manager. this doesn't change the mgr that was previously set.

void endCollection (CollectionProps *cp)
 signify the end of this collection. properties passed at the beginning of a collection should be handled by the constructor.

void setDocManager (const char *mgrID)
 set the document manager to use for succeeding documents.


Protected Methods

void writeTOC (int numinv)
void writeDocIDs ()
void writeCache ()
void lastWriteCache ()
void writeDTIDs ()
void writeDocMgrIDs ()
int docMgrID (const char *mgr)
 returns the internal id of given docmgr if not already registered, mgr will be added.

virtual void doendDoc (DocumentProps *dp,int mgrid)

Protected Attributes

long maxfile
MemCachecache
 the biggest our file size can be.

vector<char*> docIDs
 the main memory handler for building.

vector<char*> termIDs
 list of external docids in internal docid order.

vector<char*> tempfiles
 list of terms in termid order.

vector<char*> dtfiles
 list of tempfiles we've written to flush cache.

vector<char*> docmgrs
 list of dt index files.

FILE* writetlookup
ofstream writetlist
 filestream for writing the lookup table to the docterm db.

int tcount
 filestream for writing the list of located terms for each document.

int tidcount
 count of total terms.

int dtidcount
 count of unique terms.

char* name
 count of unique terms in a current doc.

int namelen
 the prefix name.

TABLE_T wordtable
 the length of the name (avoid many calls to strlen).

map<int, int> termlist
 table of all terms and their doclists.

int* membuf
 maps of terms and freqs.

int membufsize
 memory to use for cache and buffers.

int curdocmgr

Constructor & Destructor Documentation

InvPushIndex::InvPushIndex ( char * prefix = "DefaultIndex",
int cachesize = 128000000,
long maxfilesize = 2100000000,
DOCID_T startdocid = 1 )
 

InvPushIndex::~InvPushIndex ( )
 


Member Function Documentation

bool InvPushIndex::addTerm ( Term & t ) [virtual]
 

adding a term to the current document, returns true if term was added successfully.

Reimplemented from PushIndex.

Reimplemented in InvFPPushIndex.

bool InvPushIndex::beginDoc ( DocumentProps * dp ) [virtual]
 

the beginning of a new document, returns true if initiation was successful.

Reimplemented from PushIndex.

int InvPushIndex::docMgrID ( const char * mgr ) [protected]
 

returns the internal id of given docmgr if not already registered, mgr will be added.

void InvPushIndex::doendDoc ( DocumentProps * dp,
int mgrid ) [protected, virtual]
 

Reimplemented in InvFPPushIndex.

void InvPushIndex::endCollection ( CollectionProps * cp ) [virtual]
 

signify the end of this collection. properties passed at the beginning of a collection should be handled by the constructor.

Reimplemented from PushIndex.

Reimplemented in InvFPPushIndex.

void InvPushIndex::endDoc ( DocumentProps * dp,
const char * mgr ) [virtual]
 

signify the end of current document and associate with certain document manager. this doesn't change the mgr that was previously set.

void InvPushIndex::endDoc ( DocumentProps * dp ) [virtual]
 

signify the end of current document.

Reimplemented from PushIndex.

void InvPushIndex::lastWriteCache ( ) [protected]
 

void InvPushIndex::setDocManager ( const char * mgrID ) [virtual]
 

set the document manager to use for succeeding documents.

Reimplemented from PushIndex.

void InvPushIndex::setName ( char * prefix )
 

sets the name for this index. the name will be the prefix for all files related to this index.

void InvPushIndex::writeCache ( ) [protected]
 

void InvPushIndex::writeDTIDs ( ) [protected]
 

void InvPushIndex::writeDocIDs ( ) [protected]
 

void InvPushIndex::writeDocMgrIDs ( ) [protected]
 

void InvPushIndex::writeTOC ( int numinv ) [protected]
 

Reimplemented in InvFPPushIndex.


Member Data Documentation

MemCache * InvPushIndex::cache [protected]
 

the biggest our file size can be.

int InvPushIndex::curdocmgr [protected]
 

vector< char *> InvPushIndex::docIDs [protected]
 

the main memory handler for building.

vector< char *> InvPushIndex::docmgrs [protected]
 

list of dt index files.

vector< char *> InvPushIndex::dtfiles [protected]
 

list of tempfiles we've written to flush cache.

int InvPushIndex::dtidcount [protected]
 

count of unique terms.

long InvPushIndex::maxfile [protected]
 

int * InvPushIndex::membuf [protected]
 

maps of terms and freqs.

int InvPushIndex::membufsize [protected]
 

memory to use for cache and buffers.

char * InvPushIndex::name [protected]
 

count of unique terms in a current doc.

int InvPushIndex::namelen [protected]
 

the prefix name.

int InvPushIndex::tcount [protected]
 

filestream for writing the list of located terms for each document.

vector< char *> InvPushIndex::tempfiles [protected]
 

list of terms in termid order.

vector< char *> InvPushIndex::termIDs [protected]
 

list of external docids in internal docid order.

map< int,int > InvPushIndex::termlist [protected]
 

table of all terms and their doclists.

Reimplemented in InvFPPushIndex.

int InvPushIndex::tidcount [protected]
 

count of total terms.

TABLE_T InvPushIndex::wordtable [protected]
 

the length of the name (avoid many calls to strlen).

ofstream InvPushIndex::writetlist [protected]
 

filestream for writing the lookup table to the docterm db.

FILE * InvPushIndex::writetlookup [protected]
 


The documentation for this class was generated from the following files:
Generated at Fri Jul 26 18:27:04 2002 for LEMUR by doxygen1.2.4 written by Dimitri van Heesch, © 1997-2000