Main Page   Namespace List   Class Hierarchy   Compound List   File List   Namespace Members   Compound Members   File Members   Related Pages  

ReutersParser Class Reference

#include <ReutersParser.hpp>

Inheritance diagram for ReutersParser:

Parser TextHandler List of all members.

Public Methods

 ReutersParser ()
void parseFile (char *filename)
 Parse a file.

void parseBuffer (char *buf, int len)
 Parse a buffer.

long fileTell ()
 return the current byte position of the file being parsed


Private Methods

void doParse ()
 Actual parsing action flow.


Private Attributes

int state
 The state of the parser.


Detailed Description

Parses documents in TREC format. Does case folding for words that are not in the acronym list. Contraction suffixes and possessive suffixes are stripped.

U.S.A., USA's, and USAs are converted to USA. Does not recognize acronyms with numbers.

The following fields are parsed: text, headline, title


Constructor & Destructor Documentation

ReutersParser::ReutersParser  
 


Member Function Documentation

void ReutersParser::doParse   [private]
 

Actual parsing action flow.

long ReutersParser::fileTell   [virtual]
 

return the current byte position of the file being parsed

Implements Parser.

void ReutersParser::parseBuffer char *    buf,
int    len
[virtual]
 

Parse a buffer.

Implements Parser.

void ReutersParser::parseFile char *    filename [virtual]
 

Parse a file.

Implements Parser.


Member Data Documentation

int ReutersParser::state [private]
 

The state of the parser.


The documentation for this class was generated from the following file:
Generated on Fri Feb 6 07:12:06 2004 for LEMUR by doxygen1.2.16