Main Page   Namespace List   Class Hierarchy   Alphabetical List   Compound List   File List   Namespace Members   Compound Members   File Members   Related Pages  

BrillPOSTokenizer Class Reference

#include <BrillPOSTokenizer.hpp>

Inheritance diagram for BrillPOSTokenizer:

TextHandler List of all members.

Public Methods

 BrillPOSTokenizer ()
 make a new POSTokenizer with default split character "/"

 BrillPOSTokenizer (char s)
 make a new POSTokenizer with a different splitting character

void setDelimiter (char s)
 set a new delimiter character to split into tokens

char * handleWord (char *word, const char *original, PropertyList *list)

Protected Attributes

char splitter
Property pos

Detailed Description

This TextHandler parses tokens that have been put through Brill's POS tagger. This is usually of the format "word/POS". This TH will split the token at the delimiter, send the word as is along the pipeline with the POS added as a Property. TextHandlers further down the chain can access the POS by getting a Property named "POS" from the PropertyList. Generally, this Parser should be chained after a TextHandler tokenizing parser, such as the WebParser, and before sending to Stopper or Stemmer.


Constructor & Destructor Documentation

BrillPOSTokenizer::BrillPOSTokenizer  
 

make a new POSTokenizer with default split character "/"

BrillPOSTokenizer::BrillPOSTokenizer char    s
 

make a new POSTokenizer with a different splitting character


Member Function Documentation

char * BrillPOSTokenizer::handleWord char *    word,
const char *    original,
PropertyList   list
[virtual]
 

split the token, send the word as is along the pipeline with the POS added as a Property

Reimplemented from TextHandler.

void BrillPOSTokenizer::setDelimiter char    s [inline]
 

set a new delimiter character to split into tokens


Member Data Documentation

Property BrillPOSTokenizer::pos [protected]
 

char BrillPOSTokenizer::splitter [protected]
 


The documentation for this class was generated from the following files:
Generated on Wed Nov 3 12:59:25 2004 for Lemur Toolkit by doxygen1.2.18