#include <DocStream.hpp>
Inheritance diagram for DocStream:

Public Methods | |
| virtual | ~DocStream () |
Document Iteration | |
| virtual void | startDocIteration ()=0 |
| start document iteration | |
| virtual bool | hasMore ()=0 |
| virtual Document * | nextDoc ()=0 |
| return a pointer to next document (static memory, do not delete returned instance). hasMore() should be called before calling nextDoc() | |
DocStream is an abstract interface for a collection of documents. A given realization can have special tokenization, document header formats, etc, and will return a special Document instance to indicate this.
The following is an example of supporting an index with position information:
An example of supporting index with position information
// a DocStream that handles position
class PosDocStream : public DocStream {
...
Document *nextDoc() {
return (new PosDocument(...)); // returns a special Document
}
...
};
// a Document that has position information
class PosDocument : public Document {
...
TokenTerm *nextTerm() {
return (new PosTerm(...)); // returns a special Term
}
};
// a Term that has position
class PosTerm: public TokenTerm {
int getPosition() {
...
}
};
// Indexer that records term positions
class PosIndex : public Index {
...
PosDocStream *db;
... // when indexing
db->startDocIteration();
Document *doc;
while (db->hasMore()) {
Document *doc = db->nextDoc(); // we'll actually get a PosDocument
doc->startTermIteration();
PosTerm *term;
while (doc->hasMore()) {
term = (PosTerm *)nextTerm(term);
// note that down-casting!
term->getPosition();
term->spelling();
...
}
}
...
}
|
|
|
|
|
Implemented in BasicDocStream. |
|
|
return a pointer to next document (static memory, do not delete returned instance). hasMore() should be called before calling nextDoc()
Implemented in BasicDocStream. |
|
|
start document iteration Typical usage:
... myStream.startDocIteration(); Document *doc; while (myStream.nextDoc(doc)) { Term *term; doc->startTermIteration(); while (doc->nextTerm(term)) { ... process "term" ... YOU MUST NOT DELETE term, as it is a pointer to a local static memory } YOU MUST NOT DELETE doc, as it is a pointer to a local static memory } Implemented in BasicDocStream. |
1.2.18