Main Page   Namespace List   Class Hierarchy   Compound List   File List   Namespace Members   Compound Members   File Members   Related Pages  

TrecParser Class Reference

Parses documents in NIST's TREC format. Does case folding for words that are not in the acronym list. Contraction suffixes and possessive suffixes are stripped. U.S.A., USA's, and USAs are converted to USA. Does not recognize acronyms with numbers. The following fields are parsed: TEXT, HL, HEAD, HEADLINE, LP, TTL. More...

#include <TrecParser.hpp>

Inheritance diagram for TrecParser:

Parser TextHandler List of all members.

Public Methods

 TrecParser ()
void parseFile (char *filename)
 Parse a file.

void parseBuffer (char *buf,int len)
 Parse a buffer of len length.

long fileTell ()
 Gives current byte position offset into file being parsed. Don't use with parseBuffer.


Detailed Description

Parses documents in NIST's TREC format. Does case folding for words that are not in the acronym list. Contraction suffixes and possessive suffixes are stripped. U.S.A., USA's, and USAs are converted to USA. Does not recognize acronyms with numbers. The following fields are parsed: TEXT, HL, HEAD, HEADLINE, LP, TTL.


Constructor & Destructor Documentation

TrecParser::TrecParser ( )
 


Member Function Documentation

long TrecParser::fileTell ( ) [virtual]
 

Gives current byte position offset into file being parsed. Don't use with parseBuffer.

Reimplemented from Parser.

void TrecParser::parseBuffer ( char * buf,
int len ) [virtual]
 

Parse a buffer of len length.

Reimplemented from Parser.

void TrecParser::parseFile ( char * filename ) [virtual]
 

Parse a file.

Reimplemented from Parser.


The documentation for this class was generated from the following file:
Generated at Fri Jul 26 18:27:25 2002 for LEMUR by doxygen1.2.4 written by Dimitri van Heesch, © 1997-2000