Main Page   Namespace List   Class Hierarchy   Alphabetical List   Compound List   File List   Namespace Members   Compound Members   File Members   Related Pages  

FreqCounter Class Reference

#include <FreqCounter.hpp>

Inheritance diagram for FreqCounter:

TextHandler List of all members.

Public Methods

 FreqCounter (Stopper *stopWords=NULL)
 FreqCounter (char *filename, Stopper *stopWords=NULL)
 ~FreqCounter ()
 Delete the freqency counter.

void clear ()
 Clear the frequency counter (set all counts to 0).

void output (char *filename)
 Output the frequency information to a file.

char * randomWord ()
void setRandomMode (int mode)
int getRandomMode ()
char * randomCtf ()
char * randomDf ()
char * randomAveTf ()
char * randomUniform ()
int numWords ()
int totWords ()
freqmapgetFreqInfo ()
int getCtf (char *word)
int getDf (char *word)
double getAveTf (char *word)
double ctfRatio (FreqCounter &lm1)
char * handleDoc (char *docno)
 Overridden from TextHandler.

char * handleWord (char *word)
 Overridden from TextHandler - increments collection term frequencies.

void endDoc ()
 Specifies end of a document - updates document frequencies.

void setName (char *freqCounterName)
 Set the name of language model described by the frequency counter.

char * getName ()
 Get the counter's name.

void pruneBottomWords (int topWords)
 Prune least frequent words, keeping only topWords most frequent words.


Protected Methods

void input (char *filename)

Protected Attributes

freqmap freqInfo
stringset doc
stringset randdone
char * name
Stopperstopper
long ctfTot
int dfTot
long double avetfTot
bool atfValid
int randomMode
int nWords

Detailed Description

Counts collection term frequencies and document frequencies. Also provides a means for selecting random words. The FreqCounter can use a stopword list.


Constructor & Destructor Documentation

FreqCounter::FreqCounter Stopper   stopWords = NULL
 

Create a frequency counter with the specified stopword list. The stopWords parameter is optional.

FreqCounter::FreqCounter char *    filename,
Stopper   stopWords = NULL
 

Create a frequency counter (loading it from file) with the specified stopword list. Thes stopWords parameter is optional.

FreqCounter::~FreqCounter  
 

Delete the freqency counter.


Member Function Documentation

void FreqCounter::clear  
 

Clear the frequency counter (set all counts to 0).

double FreqCounter::ctfRatio FreqCounter &    lm1
 

Compare lm1 to this language model, returning the ctf ratio.

void FreqCounter::endDoc  
 

Specifies end of a document - updates document frequencies.

double FreqCounter::getAveTf char *    word
 

Get the average term frequency for a word.

int FreqCounter::getCtf char *    word
 

Get the collection term frequency for a word.

int FreqCounter::getDf char *    word
 

Get the document frequency for a word.

freqmap * FreqCounter::getFreqInfo  
 

Get a reference to the internal frequency count map.

char * FreqCounter::getName  
 

Get the counter's name.

int FreqCounter::getRandomMode  
 

Gets the current random word mode. See setRandomMode(...)

char * FreqCounter::handleDoc char *    docno [virtual]
 

Overridden from TextHandler.

Reimplemented from TextHandler.

char * FreqCounter::handleWord char *    word [virtual]
 

Overridden from TextHandler - increments collection term frequencies.

Reimplemented from TextHandler.

void FreqCounter::input char *    filename [protected]
 

int FreqCounter::numWords  
 

Return the number of unique words seen across all documents processed.

void FreqCounter::output char *    filename
 

Output the frequency information to a file.

void FreqCounter::pruneBottomWords int    topWords
 

Prune least frequent words, keeping only topWords most frequent words.

char * FreqCounter::randomAveTf  
 

Select a word at random using average term frequency. This word is no guarenteed to be unique from other calls to this function.

char * FreqCounter::randomCtf  
 

Select a word at random using collection term frequency. This word is not guarenteed to be unique from other calls to this function.

char * FreqCounter::randomDf  
 

Select a word at random using document frequency. This word is not guarenteed to be unique from other calls to this function.

char * FreqCounter::randomUniform  
 

Select a word at random with equal probability for each word. This word is not guarenteed to be unique from other calls to this funtion.

char * FreqCounter::randomWord  
 

Get a random word from the distribution specified by setRandomMode. The random word is unique since the last clear operation.

void FreqCounter::setName char *    freqCounterName
 

Set the name of language model described by the frequency counter.

void FreqCounter::setRandomMode int    mode
 

Set the random word selection mode: R_CTF - select using collection term frequency R_DF - select using document frequency R_AVE_TF - select using average term frequency R_UNIFORM - select each word with equal probability

int FreqCounter::totWords  
 

Return the total words seen across all documents processed.


Member Data Documentation

bool FreqCounter::atfValid [protected]
 

long double FreqCounter::avetfTot [protected]
 

long FreqCounter::ctfTot [protected]
 

int FreqCounter::dfTot [protected]
 

stringset FreqCounter::doc [protected]
 

freqmap FreqCounter::freqInfo [protected]
 

char* FreqCounter::name [protected]
 

int FreqCounter::nWords [protected]
 

stringset FreqCounter::randdone [protected]
 

int FreqCounter::randomMode [protected]
 

Stopper* FreqCounter::stopper [protected]
 


The documentation for this class was generated from the following files:
Generated on Tue Nov 25 11:27:08 2003 for Lemur Toolkit by doxygen1.2.18