Main Page   Namespace List   Class Hierarchy   Compound List   File List   Namespace Members   Compound Members   File Members   Related Pages  

FreqCounter Class Reference

#include <FreqCounter.hpp>

Inheritance diagram for FreqCounter:

TextHandler List of all members.

Public Methods

 FreqCounter (Stopper *stopWords=NULL)
 Create a frequency counter with the specified stopword list. The stopWords parameter is optional.

 FreqCounter (char *filename,Stopper *stopWords=NULL)
 Create a frequency counter (loading it from file) with the specified stopword list. Thes stopWords parameter is optional.

 ~FreqCounter ()
 Delete the freqency counter.

void clear ()
 Clear the frequency counter (set all counts to 0).

void output (char *filename)
 Output the frequency information to a file.

char* randomWord ()
void setRandomMode (int mode)
int getRandomMode ()
char* randomCtf ()
char* randomDf ()
char* randomAveTf ()
char* randomUniform ()
int numWords ()
int totWords ()
freqmapgetFreqInfo ()
int getCtf (char *word)
int getDf (char *word)
double getAveTf (char *word)
double ctfRatio (FreqCounter &lm1)
char* handleDoc (char *docno)
 Overridden from TextHandler.

char* handleWord (char *word)
 Overridden from TextHandler - increments collection term frequencies.

void endDoc ()
 Specifies end of a document - updates document frequencies.

void setName (char *freqCounterName)
 Set the name of language model described by the frequency counter.

char* getName ()
 Get the counter's name.

void pruneBottomWords (int topWords)
 Prune least frequent words, keeping only topWords most frequent words.


Protected Methods

void input (char *filename)

Protected Attributes

freqmap freqInfo
stringset doc
stringset randdone
char* name
Stopperstopper
long ctfTot
int dfTot
long double avetfTot
bool atfValid
int randomMode
int nWords

Detailed Description

Counts collection term frequencies and document frequencies. Also provides a means for selecting random words. The FreqCounter can use a stopword list.


Constructor & Destructor Documentation

FreqCounter::FreqCounter ( Stopper * stopWords = NULL )
 

Create a frequency counter with the specified stopword list. The stopWords parameter is optional.

FreqCounter::FreqCounter ( char * filename,
Stopper * stopWords = NULL )
 

Create a frequency counter (loading it from file) with the specified stopword list. Thes stopWords parameter is optional.

FreqCounter::~FreqCounter ( )
 

Delete the freqency counter.


Member Function Documentation

void FreqCounter::clear ( )
 

Clear the frequency counter (set all counts to 0).

double FreqCounter::ctfRatio ( FreqCounter & lm )
 

Compare lm1 to this language model, returning the ctf ratio.

void FreqCounter::endDoc ( )
 

Specifies end of a document - updates document frequencies.

double FreqCounter::getAveTf ( char * word )
 

Get the average term frequency for a word.

int FreqCounter::getCtf ( char * word )
 

Get the collection term frequency for a word.

int FreqCounter::getDf ( char * word )
 

Get the document frequency for a word.

freqmap * FreqCounter::getFreqInfo ( )
 

Get a reference to the internal frequency count map.

char * FreqCounter::getName ( )
 

Get the counter's name.

int FreqCounter::getRandomMode ( )
 

Gets the current random word mode. See setRandomMode(...)

char * FreqCounter::handleDoc ( char * docno ) [virtual]
 

Overridden from TextHandler.

Reimplemented from TextHandler.

char * FreqCounter::handleWord ( char * word ) [virtual]
 

Overridden from TextHandler - increments collection term frequencies.

Reimplemented from TextHandler.

void FreqCounter::input ( char * filename ) [protected]
 

int FreqCounter::numWords ( )
 

Return the number of unique words seen across all documents processed.

void FreqCounter::output ( char * filename )
 

Output the frequency information to a file.

void FreqCounter::pruneBottomWords ( int numTopWords )
 

Prune least frequent words, keeping only topWords most frequent words.

char * FreqCounter::randomAveTf ( )
 

Select a word at random using average term frequency. This word is no guarenteed to be unique from other calls to this function.

char * FreqCounter::randomCtf ( )
 

Select a word at random using collection term frequency. This word is not guarenteed to be unique from other calls to this function.

char * FreqCounter::randomDf ( )
 

Select a word at random using document frequency. This word is not guarenteed to be unique from other calls to this function.

char * FreqCounter::randomUniform ( )
 

Select a word at random with equal probability for each word. This word is not guarenteed to be unique from other calls to this funtion.

char * FreqCounter::randomWord ( )
 

Get a random word from the distribution specified by setRandomMode. The random word is unique since the last clear operation.

void FreqCounter::setName ( char * freqCounterName )
 

Set the name of language model described by the frequency counter.

void FreqCounter::setRandomMode ( int mode )
 

Set the random word selection mode: R_CTF - select using collection term frequency R_DF - select using document frequency R_AVE_TF - select using average term frequency R_UNIFORM - select each word with equal probability

int FreqCounter::totWords ( )
 

Return the total words seen across all documents processed.


Member Data Documentation

bool FreqCounter::atfValid [protected]
 

long double FreqCounter::avetfTot [protected]
 

long FreqCounter::ctfTot [protected]
 

int FreqCounter::dfTot [protected]
 

stringset FreqCounter::doc [protected]
 

freqmap FreqCounter::freqInfo [protected]
 

int FreqCounter::nWords [protected]
 

char * FreqCounter::name [protected]
 

stringset FreqCounter::randdone [protected]
 

int FreqCounter::randomMode [protected]
 

Stopper * FreqCounter::stopper [protected]
 


The documentation for this class was generated from the following files:
Generated at Fri Jul 26 18:26:59 2002 for LEMUR by doxygen1.2.4 written by Dimitri van Heesch, © 1997-2000