Main Page   Namespace List   Class Hierarchy   Compound List   File List   Namespace Members   Compound Members   File Members   Related Pages  

TrecParser Class Reference

#include <TrecParser.hpp>

Inheritance diagram for TrecParser:

Parser TextHandler List of all members.

Public Methods

 TrecParser ()
void parseFile (char *filename)
 Parse a file.

void parseBuffer (char *buf, int len)
 Parse a buffer of len length.

long fileTell ()

Private Methods

void doParse ()
 Actual parsing action flow.


Private Attributes

int state
 The state of the parser.

Property begelem
 keep a property for being and end of elements

Property endelem
LinkedPropertyList proplist
 list


Detailed Description

Parses documents in NIST's TREC format. Does case folding for words that are not in the acronym list. Contraction suffixes and possessive suffixes are stripped.

U.S.A., USA's, and USAs are converted to USA. Does not recognize acronyms with numbers.

The following fields are parsed: TEXT, HL, HEAD, HEADLINE, LP, TTL


Constructor & Destructor Documentation

TrecParser::TrecParser  
 


Member Function Documentation

void TrecParser::doParse   [private]
 

Actual parsing action flow.

long TrecParser::fileTell   [virtual]
 

Gives current byte position offset into file being parsed. Don't use with parseBuffer

Implements Parser.

void TrecParser::parseBuffer char *    buf,
int    len
[virtual]
 

Parse a buffer of len length.

Implements Parser.

void TrecParser::parseFile char *    filename [virtual]
 

Parse a file.

Implements Parser.


Member Data Documentation

Property TrecParser::begelem [private]
 

keep a property for being and end of elements

Property TrecParser::endelem [private]
 

LinkedPropertyList TrecParser::proplist [private]
 

list

int TrecParser::state [private]
 

The state of the parser.


The documentation for this class was generated from the following file:
Generated on Fri Feb 6 07:12:08 2004 for LEMUR by doxygen1.2.16