Main Page   Namespace List   Class Hierarchy   Compound List   File List   Namespace Members   Compound Members   File Members   Related Pages  

Index Class Reference

Abstract Class for indexed document collection. More...

#include <Index.hpp>

Inheritance diagram for Index:

BasicIndex IndexWithCat InvFPIndex BasicIndexWithCat List of all members.

Public Methods

virtual ~Index ()
Open index
virtual bool open (const char *indexName)=0
 Open previously created Index, return true if opened successfully, indexName should be the full name of the table-of-content file for the index. E.g., "index.bsc" for an index built with the basic indexer.

Spelling and index conversion
virtual int term (const char *word)=0
 Convert a term spelling to a termID, returns 0 if out of vocabulary. Valid index starts at 1.

virtual const char* term (int termID)=0
 Convert a valid termID to its spelling.

virtual int document (const char *docIDStr)=0
 Convert a spelling to docID, returns 0 if out of vocabulary. Valid index starts at 1.

virtual const char* document (int docID)=0
 Convert a valid docID to its spelling.

virtual const char* termLexiconID ()
 Return a string ID for the term lexicon (usually the file name of the lexicon). More...

Summary counts
virtual int docCount ()=0
 Total count (i.e., number) of documents in collection.

virtual int termCountUnique ()=0
 Total count of unique terms in collection, i.e., the term vocabulary size.

virtual int termCount (int termID)const=0
 Total counts of a term in collection.

virtual int termCount ()const=0
 Total counts of all terms in collection.

virtual float docLengthAvg ()=0
 Average document length.

virtual int docCount (int termID)=0
 Total counts of doc with a given term.

virtual int docLength (int docID)const=0
 Total counts of terms in a document.

Index entry access
virtual DocInfoListdocInfoList (int termID)=0
 returns a new instance of DocInfoList which represents the doc entries in a term index, you must delete the instance later.
See also:
DocInfoList.


virtual TermInfoListtermInfoList (int docID)=0
 returns a new instance of TermInfoList which represents the word entries in a document index, you must delete the instance later.
See also:
TermInfoList.



Detailed Description

Abstract Class for indexed document collection.

This is an abstract class that provides a uniform interface for access to an indexed document collection. The following is an example of using it.



Index &myIndex;

myIndex.open("index-file");


int t1;
... 

// now fetch doc info list for term t1
// this returns a dynamic instance, so you'll need to delete it
DocInfoList *docList = myIndex.docInfoList(t1);

docList->startIteration();

DocInfo *entry;
while (docList->hasMore()) {
  entry = docList->nextEntry(); 
  // this returns a pointer to a *static* memory, do don't delete entry!
  
  cout << "entry doc id: "<< entry->docID() <<endl;
  cout << "entry term count: "<< entry->termCount() << endl;
}

delete docList;


Constructor & Destructor Documentation

Index::~Index ( ) [inline, virtual]
 


Member Function Documentation

int Index::docCount ( int termID ) [pure virtual]
 

Total counts of doc with a given term.

Reimplemented in BasicIndex, BasicIndexWithCat, and InvFPIndex.

int Index::docCount ( ) [pure virtual]
 

Total count (i.e., number) of documents in collection.

Reimplemented in BasicIndex, BasicIndexWithCat, and InvFPIndex.

DocInfoList * Index::docInfoList ( int termID ) [pure virtual]
 

returns a new instance of DocInfoList which represents the doc entries in a term index, you must delete the instance later.

See also:
DocInfoList.

Reimplemented in BasicIndex, BasicIndexWithCat, and InvFPIndex.

int Index::docLength ( int docID ) const [pure virtual]
 

Total counts of terms in a document.

Reimplemented in BasicIndex, and BasicIndexWithCat.

float Index::docLengthAvg ( ) [pure virtual]
 

Average document length.

Reimplemented in BasicIndex, BasicIndexWithCat, and InvFPIndex.

const char * Index::document ( int docID ) [pure virtual]
 

Convert a valid docID to its spelling.

Reimplemented in BasicIndex, BasicIndexWithCat, and InvFPIndex.

int Index::document ( const char * docIDStr ) [pure virtual]
 

Convert a spelling to docID, returns 0 if out of vocabulary. Valid index starts at 1.

Reimplemented in BasicIndex, BasicIndexWithCat, and InvFPIndex.

bool Index::open ( const char * indexName ) [pure virtual]
 

Open previously created Index, return true if opened successfully, indexName should be the full name of the table-of-content file for the index. E.g., "index.bsc" for an index built with the basic indexer.

Reimplemented in BasicIndex, BasicIndexWithCat, and InvFPIndex.

const char * Index::term ( int termID ) [pure virtual]
 

Convert a valid termID to its spelling.

Reimplemented in BasicIndex, BasicIndexWithCat, and InvFPIndex.

int Index::term ( const char * word ) [pure virtual]
 

Convert a term spelling to a termID, returns 0 if out of vocabulary. Valid index starts at 1.

Reimplemented in BasicIndex, BasicIndexWithCat, and InvFPIndex.

int Index::termCount ( ) const [pure virtual]
 

Total counts of all terms in collection.

Reimplemented in BasicIndex, BasicIndexWithCat, and InvFPIndex.

int Index::termCount ( int termID ) const [pure virtual]
 

Total counts of a term in collection.

Reimplemented in BasicIndex, BasicIndexWithCat, and InvFPIndex.

int Index::termCountUnique ( ) [pure virtual]
 

Total count of unique terms in collection, i.e., the term vocabulary size.

Reimplemented in BasicIndex, BasicIndexWithCat, and InvFPIndex.

TermInfoList * Index::termInfoList ( int docID ) [pure virtual]
 

returns a new instance of TermInfoList which represents the word entries in a document index, you must delete the instance later.

See also:
TermInfoList.

Reimplemented in BasicIndex, BasicIndexWithCat, and InvFPIndex.

const char * Index::termLexiconID ( ) [inline, virtual]
 

Return a string ID for the term lexicon (usually the file name of the lexicon).

This function should be pure virtual; the default implementation is just for convenience. Appropriate implementation to be done in the future.

Reimplemented in BasicIndex.


The documentation for this class was generated from the following file:
Generated at Fri Jul 26 18:22:48 2002 for LEMUR by doxygen1.2.4 written by Dimitri van Heesch, © 1997-2000