#include <DocStream.hpp>
Inheritance diagram for DocStream:
Public Methods | |
virtual | ~DocStream () |
Document Iteration | |
virtual void | startDocIteration ()=0 |
start document iteration | |
virtual bool | hasMore ()=0 |
virtual Document * | nextDoc ()=0 |
return a pointer to next document (static memory, do not delete returned instance). hasMore() should be called before calling nextDoc() |
DocStream is an abstract interface for a collection of documents. A given realization can have special tokenization, document header formats, etc, and will return a special Document instance to indicate this.
The following is an example of supporting an index with position information:
An example of supporting index with position information
// a DocStream that handles position class PosDocStream : public DocStream { ... Document *nextDoc() { return (new PosDocument(...)); // returns a special Document } ... }; // a Document that has position information class PosDocument : public Document { ... TokenTerm *nextTerm() { return (new PosTerm(...)); // returns a special Term } }; // a Term that has position class PosTerm: public TokenTerm { int getPosition() { ... } }; // Indexer that records term positions class PosIndex : public Index { ... PosDocStream *db; ... // when indexing db->startDocIteration(); Document *doc; while (db->hasMore()) { Document *doc = db->nextDoc(); // we'll actually get a PosDocument doc->startTermIteration(); PosTerm *term; while (doc->hasMore()) { term = (PosTerm *)nextTerm(term); // note that down-casting! term->getPosition(); term->spelling(); ... } } ... }
|
|
|
Implemented in BasicDocStream. |
|
return a pointer to next document (static memory, do not delete returned instance). hasMore() should be called before calling nextDoc()
Implemented in BasicDocStream. |
|
start document iteration Typical usage:
... myStream.startDocIteration(); Document *doc; while (myStream.nextDoc(doc)) { Term *term; doc->startTermIteration(); while (doc->nextTerm(term)) { ... process "term" ... YOU MUST NOT DELETE term, as it is a pointer to a local static memory } YOU MUST NOT DELETE doc, as it is a pointer to a local static memory } Implemented in BasicDocStream. |