Main Page   Namespace List   Class Hierarchy   Compound List   File List   Namespace Members   Compound Members   File Members   Related Pages  

BrillPOSParser Class Reference

#include <BrillPOSParser.hpp>

Inheritance diagram for BrillPOSParser:

Parser TextHandler List of all members.

Public Methods

 BrillPOSParser ()
void parseFile (char *filename)
 Parse a file.

void parseBuffer (char *buf, int len)
 Parse a buffer.

long fileTell ()
 return the current byte position of the file being parsed


Private Methods

void doParse ()
 Actual parsing action flow.


Private Attributes

int state
 The state of the parser.

int poscount
 count position of word in document

Property wordpos
 keep one property and change values

Property tag
LinkedPropertyList proplist
 list


Detailed Description

Parses documents in with similar document separation tags NIST's Web format. <DOC></DOC> around documents and <DOCNO></DOCNO> around docids. recognizes tokens with "/" slashes in them, which is the default separator for Brill's part of speech tagger. Use with BrillPOSTokenizer. This parser also recognizes ./. ?/. and !/. as end of sentence markers and sends along a [eos] token to be indexed. Does case folding for words that are not in the acronym list. Contraction suffixes and possessive suffixes are stripped.

U.S.A., USA's, and USAs are converted to USA. Does not recognize acronyms with numbers.


Constructor & Destructor Documentation

BrillPOSParser::BrillPOSParser  
 


Member Function Documentation

void BrillPOSParser::doParse   [private]
 

Actual parsing action flow.

long BrillPOSParser::fileTell   [virtual]
 

return the current byte position of the file being parsed

Implements Parser.

void BrillPOSParser::parseBuffer char *    buf,
int    len
[virtual]
 

Parse a buffer.

Implements Parser.

void BrillPOSParser::parseFile char *    filename [virtual]
 

Parse a file.

Implements Parser.


Member Data Documentation

int BrillPOSParser::poscount [private]
 

count position of word in document

LinkedPropertyList BrillPOSParser::proplist [private]
 

list

int BrillPOSParser::state [private]
 

The state of the parser.

Property BrillPOSParser::tag [private]
 

Property BrillPOSParser::wordpos [private]
 

keep one property and change values


The documentation for this class was generated from the following file:
Generated on Fri Feb 6 07:11:58 2004 for LEMUR by doxygen1.2.16