Main Page   Namespace List   Class Hierarchy   Compound List   File List   Namespace Members   Compound Members   File Members   Related Pages  

WebParser Class Reference

#include <WebParser.hpp>

Inheritance diagram for WebParser:

Parser TextHandler List of all members.

Public Methods

 WebParser ()
void parseFile (char *filename)
 Parse a file.

void parseBuffer (char *buf, int len)
 Parse a buffer.

long fileTell ()
 return the current byte position of the file being parsed


Detailed Description

Parses documents in NIST's Web TREC format. Does case folding for words that are not in the acronym list. Contraction suffixes and possessive suffixes are stripped.

U.S.A., USA's, and USAs are converted to USA. Does not recognize acronyms with numbers.

The DOCHDR is ignored.

Text in <script> tags is ignored. Text in HTML comments is ignored.


Constructor & Destructor Documentation

WebParser::WebParser  
 


Member Function Documentation

long WebParser::fileTell   [virtual]
 

return the current byte position of the file being parsed

Implements Parser.

void WebParser::parseBuffer char *    buf,
int    len
[virtual]
 

Parse a buffer.

Implements Parser.

void WebParser::parseFile char *    filename [virtual]
 

Parse a file.

Implements Parser.


The documentation for this class was generated from the following file:
Generated on Mon Sep 30 14:14:28 2002 for LEMUR by doxygen1.2.18