#include <Index.hpp>
Inheritance diagram for Index:
Public Methods | |
virtual | ~Index () |
Open index | |
virtual bool | open (const char *indexName)=0 |
Open previously created Index, return true if opened successfully, indexName should be the full name of the table-of-content file for the index. E.g., "index.bsc" for an index built with the basic indexer. | |
Spelling and index conversion | |
virtual int | term (const char *word)=0 |
Convert a term spelling to a termID, returns 0 if out of vocabulary. Valid index starts at 1. | |
virtual const char* | term (int termID)=0 |
Convert a valid termID to its spelling. | |
virtual int | document (const char *docIDStr)=0 |
Convert a spelling to docID, returns 0 if out of vocabulary. Valid index starts at 1. | |
virtual const char* | document (int docID)=0 |
Convert a valid docID to its spelling. | |
virtual const char* | docManager (int docID) |
A String identifier for the document manager to get at the source of the document with this document id. | |
virtual const char* | termLexiconID () |
Return a string ID for the term lexicon (usually the file name of the lexicon). More... | |
Summary counts | |
virtual int | docCount ()=0 |
Total count (i.e., number) of documents in collection. | |
virtual int | termCountUnique ()=0 |
Total count of unique terms in collection, i.e., the term vocabulary size. | |
virtual int | termCount (int termID)const=0 |
Total counts of a term in collection. | |
virtual int | termCount ()const=0 |
Total counts of all terms in collection. | |
virtual float | docLengthAvg ()=0 |
Average document length. | |
virtual int | docCount (int termID)=0 |
Total counts of doc with a given term. | |
virtual int | docLength (int docID)const=0 |
Total counts of terms in a document. | |
Index entry access | |
virtual DocInfoList* | docInfoList (int termID)=0 |
returns a new instance of DocInfoList which represents the doc entries in a term index, you must delete the instance later.
| |
virtual TermInfoList* | termInfoList (int docID)=0 |
returns a new instance of TermInfoList which represents the word entries in a document index, you must delete the instance later.
|
This is an abstract class that provides a uniform interface for access to an indexed document collection. The following is an example of using it.
Index &myIndex; myIndex.open("index-file"); int t1; ... // now fetch doc info list for term t1 // this returns a dynamic instance, so you'll need to delete it DocInfoList *docList = myIndex.docInfoList(t1); docList->startIteration(); DocInfo *entry; while (docList->hasMore()) { entry = docList->nextEntry(); // this returns a pointer to a *static* memory, do don't delete entry! cout << "entry doc id: "<< entry->docID() <<endl; cout << "entry term count: "<< entry->termCount() << endl; } delete docList;
|
|
|
Total counts of doc with a given term.
Reimplemented in BasicIndex, BasicIndexWithCat, and InvIndex. |
|
Total count (i.e., number) of documents in collection.
Reimplemented in BasicIndex, BasicIndexWithCat, and InvIndex. |
|
returns a new instance of DocInfoList which represents the doc entries in a term index, you must delete the instance later.
Reimplemented in BasicIndex, BasicIndexWithCat, InvFPIndex, and InvIndex. |
|
Total counts of terms in a document.
Reimplemented in BasicIndex, and BasicIndexWithCat. |
|
Average document length.
Reimplemented in BasicIndex, BasicIndexWithCat, and InvIndex. |
|
A String identifier for the document manager to get at the source of the document with this document id.
Reimplemented in InvIndex. |
|
Convert a valid docID to its spelling.
Reimplemented in BasicIndex, BasicIndexWithCat, and InvIndex. |
|
Convert a spelling to docID, returns 0 if out of vocabulary. Valid index starts at 1.
Reimplemented in BasicIndex, BasicIndexWithCat, and InvIndex. |
|
Open previously created Index, return true if opened successfully,
Reimplemented in BasicIndex, BasicIndexWithCat, and InvIndex. |
|
Convert a valid termID to its spelling.
Reimplemented in BasicIndex, BasicIndexWithCat, and InvIndex. |
|
Convert a term spelling to a termID, returns 0 if out of vocabulary. Valid index starts at 1.
Reimplemented in BasicIndex, BasicIndexWithCat, and InvIndex. |
|
Total counts of all terms in collection.
Reimplemented in BasicIndex, BasicIndexWithCat, and InvIndex. |
|
Total counts of a term in collection.
Reimplemented in BasicIndex, BasicIndexWithCat, and InvIndex. |
|
Total count of unique terms in collection, i.e., the term vocabulary size.
Reimplemented in BasicIndex, BasicIndexWithCat, and InvIndex. |
|
returns a new instance of TermInfoList which represents the word entries in a document index, you must delete the instance later.
Reimplemented in BasicIndex, BasicIndexWithCat, InvFPIndex, and InvIndex. |
|
Return a string ID for the term lexicon (usually the file name of the lexicon). This function should be pure virtual; the default implementation is just for convenience. Appropriate implementation to be done in the future. Reimplemented in BasicIndex. |