#include <FloatFreqVector.hpp>
Inheritance diagram for FreqCounter:

Public Methods | |
| unsigned int | Hash () const |
| unsigned int | hash () const |
| bool | operator== (const FreqCounter count) |
| FreqCounter (const Stopper *stopWords=NULL) | |
| FreqCounter (const string &filename, const Stopper *stopWords=NULL) | |
| ~FreqCounter () | |
| Delete the freqency counter. | |
| void | clear () |
| Clear the frequency counter (set all counts to 0). | |
| void | output (const string &filename) const |
| Output the frequency information to a file. | |
| char * | randomWord () |
| void | setRandomMode (int mode) |
| int | getRandomMode () const |
| char * | randomCtf () const |
| char * | randomDf () const |
| char * | randomAveTf () const |
| char * | randomUniform () const |
| int | numWords () const |
| int | totWords () const |
| const freqmap * | getFreqInfo () const |
| int | getCtf (const char *word) const |
| int | getDf (const char *word) const |
| double | getAveTf (const char *word) const |
| double | ctfRatio (FreqCounter &lm1) const |
| char * | handleDoc (char *docno) |
| Overridden from TextHandler. | |
| char * | handleWord (char *word) |
| Overridden from TextHandler - increments collection term frequencies. | |
| void | endDoc () |
| Specifies end of a document - updates document frequencies. | |
| void | setName (const string &freqCounterName) |
| Set the name of language model described by the frequency counter. | |
| const string & | getName () const |
| Get the counter's name. | |
| void | pruneBottomWords (int topWords) |
| Prune least frequent words, keeping only topWords most frequent words. | |
Public Attributes | |
| int | key |
Protected Methods | |
| void | input (const string &filename) |
Protected Attributes | |
| freqmap | freqInfo |
| stringset | doc |
| stringset | randdone |
| string | name |
| const Stopper * | stopper |
| long | ctfTot |
| int | dfTot |
| long double | avetfTot |
| bool | atfValid |
| int | randomMode |
| int | nWords |
Counts collection term frequencies and document frequencies. Also provides a means for selecting random words. The FreqCounter can use a stopword list.
|
|
Create a frequency counter with the specified stopword list. The stopWords parameter is optional. |
|
||||||||||||
|
Create a frequency counter (loading it from file) with the specified stopword list. Thes stopWords parameter is optional. |
|
|
Delete the freqency counter.
|
|
|
Clear the frequency counter (set all counts to 0).
|
|
|
Compare lm1 to this language model, returning the ctf ratio. |
|
|
Specifies end of a document - updates document frequencies.
|
|
|
Get the average term frequency for a word. |
|
|
Get the collection term frequency for a word. |
|
|
Get the document frequency for a word. |
|
|
Get a reference to the internal frequency count map. |
|
|
Get the counter's name.
|
|
|
Gets the current random word mode. See setRandomMode(...) |
|
|
Overridden from TextHandler.
Reimplemented from TextHandler. |
|
|
Overridden from TextHandler - increments collection term frequencies.
Reimplemented from TextHandler. |
|
|
|
|
|
|
|
|
|
|
|
Return the number of unique words seen across all documents processed. |
|
|
|
|
|
Output the frequency information to a file.
|
|
|
Prune least frequent words, keeping only topWords most frequent words.
|
|
|
Select a word at random using average term frequency. This word is no guarenteed to be unique from other calls to this function. |
|
|
Select a word at random using collection term frequency. This word is not guarenteed to be unique from other calls to this function. |
|
|
Select a word at random using document frequency. This word is not guarenteed to be unique from other calls to this function. |
|
|
Select a word at random with equal probability for each word. This word is not guarenteed to be unique from other calls to this funtion. |
|
|
Get a random word from the distribution specified by setRandomMode. The random word is unique since the last clear operation. |
|
|
Set the name of language model described by the frequency counter.
|
|
|
Set the random word selection mode: R_CTF - select using collection term frequency R_DF - select using document frequency R_AVE_TF - select using average term frequency R_UNIFORM - select each word with equal probability |
|
|
Return the total words seen across all documents processed. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1.2.18