|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectinfo.ephyra.nlp.indices.WordFrequencies
public class WordFrequencies
Counts the frequencies of words in an arbitrary text corpus and represents them in a dictionary.
Internally, a hash table is used to store the index, which allows access to the index in constant time.
| Field Summary | |
|---|---|
private static int |
distinct
Number of distinct words in the index. |
private static java.util.Hashtable<java.lang.String,java.lang.Integer> |
index
Hashtable used to store (word, frequency) pairs. |
private static boolean |
LOWER_CASE
Whether words are converted to lower case. |
private static int |
MAX_WORDS
Maximum number of words to be parsed (0 = no limit). |
private static int |
MIN_FREQUENCY
Minimum frequency of a word to remain in the index. |
private static boolean |
SORT_BY_FREQUENCY
Whether words are saved in the order of their frequencies. |
private static int |
total
Total number of words that have been parsed. |
| Constructor Summary | |
|---|---|
WordFrequencies()
|
|
| Method Summary | |
|---|---|
static boolean |
createIndexFromDir(java.lang.String dirname)
Creates an index of word frequencies from a folder containing text files. |
static boolean |
createIndexFromFile(java.lang.String filename)
Creates an index of word frequencies from an arbitrary text file. |
static void |
dropRareWords()
Drops rare words from the index. |
static int |
getDistinct()
Returns the number of distinct words in the index. |
static java.lang.String[] |
getSortedWords()
Sorts the words in the index by their frequencies in descending order. |
static int |
getTotal()
Returns the total number of words that have been parsed. |
static boolean |
loadIndex(java.lang.String filename)
Loads an index of word frequencies from an input file. |
static int |
lookup(java.lang.String word)
Looks up a word in the index and returns its frequency. |
static double |
lookupRel(java.lang.String word)
Looks up a word in the index and returns its relative frequency. |
static void |
main(java.lang.String[] args)
Entry point. |
static boolean |
saveIndex(java.lang.String filename)
Saves index of word frequencies to an ouput file. |
static boolean |
updateIndexFromDir(java.lang.String dir)
Updates the index by adding the words contained in the files in the given folder. |
static boolean |
updateIndexFromFile(java.lang.String filename)
Updates the index with the words in an arbitrary text file. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
private static final int MAX_WORDS
private static final boolean LOWER_CASE
private static final int MIN_FREQUENCY
private static final boolean SORT_BY_FREQUENCY
private static int total
private static int distinct
private static java.util.Hashtable<java.lang.String,java.lang.Integer> index
Hashtable used to store (word, frequency) pairs.
| Constructor Detail |
|---|
public WordFrequencies()
| Method Detail |
|---|
public static boolean createIndexFromFile(java.lang.String filename)
filename - name of the text file to parse
public static boolean updateIndexFromFile(java.lang.String filename)
filename - name of the text file to parse
public static boolean createIndexFromDir(java.lang.String dirname)
dirname - name of the folder to parse
public static boolean updateIndexFromDir(java.lang.String dir)
dir - name of the folder to parse
public static void dropRareWords()
public static java.lang.String[] getSortedWords()
public static boolean saveIndex(java.lang.String filename)
filename - name of the output file to write to
public static boolean loadIndex(java.lang.String filename)
filename - name of the input file containing the index
public static int getTotal()
public static int getDistinct()
public static int lookup(java.lang.String word)
word - word to look up
public static double lookupRel(java.lang.String word)
word - word to look up
public static void main(java.lang.String[] args)
args - argument 1: folder containing text files
argument 2: output file
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||