|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectinfo.ephyra.nlp.NETagger
public class NETagger
This class combines model-based, pattern-based and list-based named entity taggers.
The pattern-based taggers are optimized for the tokenizer provided in this class. Do not use other tokenizers.
| Field Summary | |
|---|---|
private static java.lang.String[] |
allPatternNames
Collection of all NE types extracted with regular expressions. |
private static java.lang.String[] |
finderNames
NE types that are recognized by the OpenNLP name finders. |
private static opennlp.tools.lang.english.NameFinder[] |
finders
Name finders from the OpenNLP project, created from different models. |
private static int |
fuzzyListLookupThreshold
Edit distance threshold for fuzzy-lookups in dictionaries. |
private static java.lang.String[] |
listNames
NE types of the entries in the lists. |
private static java.lang.String[] |
lists
File names of lists that match different types of NEs. |
private static java.lang.String[] |
MODEL_TYPES
NE types with model-based taggers. |
private static int[] |
patternMaxTokens
Maximum number of tokens per instance for the different types of NEs. |
private static java.lang.String[] |
patternNames
NE types that are matched by the regular expressions. |
private static java.util.regex.Pattern[] |
patterns
Regular expression patterns that match different types of NEs. |
private static java.lang.String[] |
quantityPatternNames
NE types that are matched by the regular expressions. |
private static java.util.regex.Pattern[] |
quantityPatterns
Regular expression patterns that match different types of quantity NEs (number + unit). |
private static int[] |
quantityUnitPatternMaxTokens
Maximum number of tokens per instance for the different types of quantity units. |
private static java.util.regex.Pattern[] |
quantityUnitPatterns
Regular expression patterns that match different measurement units. |
private static java.lang.String[] |
stanfordNames
NE types that are recognized by the Stanford NE tagger. |
| Constructor Summary | |
|---|---|
NETagger()
|
|
| Method Summary | |
|---|---|
private static void |
addNames(java.lang.String tag,
java.util.List names,
opennlp.tools.parser.Parse[] tokens)
Adds named entity information to parses. |
static boolean |
allModelType(java.lang.String[] neTypes)
Checks if there is a model-based tagger for each of the given NE types. |
static java.lang.String[][] |
extractNes(opennlp.tools.parser.Parse parse)
THIS METHOD IS NOT USED Extracts NEs from a parse tree that has been augmented with NE tags. |
static java.lang.String[][][] |
extractNes(java.lang.String[][] sentences)
Extracts NEs from an array of tokenized sentences. |
static java.lang.String[][] |
extractNes(java.lang.String[][] sentences,
int neId)
Extracts NEs of a particular type from an array of tokenized sentences. |
private static void |
extractNesRec(opennlp.tools.parser.Parse parse,
java.util.ArrayList<java.lang.String>[] nes)
Recursive method called by extractNes(Parse) to extract NEs
from a parse tree augmented with NE tags. |
static int |
getFuzzyMatchingThreshold()
Gets the current value of the edit distance threshold for fuzzy-lookups in dictionaries. |
static int[] |
getNeIds(java.lang.String neType)
Returns the IDs of the taggers for the given NE type (there may be more than one). |
static java.lang.String |
getNeType(int neId)
Returns the NE type that is recognized by the tagger with the given ID. |
static int |
getNumberOfTaggers()
Returns the number of NE taggers. |
static boolean |
hasModelType(java.lang.String[] neTypes)
Checks if there is a model-based tagger for one of the given NE types. |
static boolean |
isModelType(java.lang.String neType)
Checks if there is a model-based tagger for the given NE type. |
static void |
loadListTaggers(java.lang.String listDirectory)
Initializes the list-based NE taggers. |
static boolean |
loadNameFinders(java.lang.String dir)
Creates the OpenNLP name finders and sets the named entity types that are recognized by the finders. |
static void |
loadRegExTaggers(java.lang.String regExListFileName)
Initializes the regular expression based NE taggers. |
static void |
setFuzzyMatchingThreshold(int threshold)
Sets the threshold for fuzzy-lookups in gazetteer lists (aka dictionaries). |
static void |
tagNes(opennlp.tools.parser.Parse[] parses)
Performs named entity tagging on an array of full parses of sentences. |
static java.lang.String[] |
tagNes(java.lang.String[] sentences)
THIS METHOD IS NOT USED Performs named entity tagging on an array of (not tokenized) sentences. |
static java.lang.String[] |
tokenize(java.lang.String text)
A rule-based tokenizer used to prepare a sentence for NE extraction. |
static java.lang.String |
tokenizeWithSpaces(java.lang.String text)
Applies the rule-based tokenizer and concatenates the tokens with spaces. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
private static java.lang.String[] MODEL_TYPES
private static opennlp.tools.lang.english.NameFinder[] finders
private static java.lang.String[] finderNames
private static java.lang.String[] stanfordNames
private static java.lang.String[] lists
private static java.lang.String[] listNames
private static int fuzzyListLookupThreshold
private static java.util.regex.Pattern[] patterns
private static int[] patternMaxTokens
private static java.lang.String[] patternNames
private static java.util.regex.Pattern[] quantityPatterns
private static java.util.regex.Pattern[] quantityUnitPatterns
private static int[] quantityUnitPatternMaxTokens
private static java.lang.String[] quantityPatternNames
private static java.lang.String[] allPatternNames
| Constructor Detail |
|---|
public NETagger()
| Method Detail |
|---|
public static boolean loadNameFinders(java.lang.String dir)
dir - directory containing the models for the name finders
public static void loadListTaggers(java.lang.String listDirectory)
listDirectory - path of the directory the list files are located inpublic static void loadRegExTaggers(java.lang.String regExListFileName)
regExListFileName - path and name of the file the names of the
patterns in use are found inpublic static int getNumberOfTaggers()
public static java.lang.String getNeType(int neId)
neId - ID of a NE tagger
null, if the ID is invalidpublic static int[] getNeIds(java.lang.String neType)
neType - NE type
public static boolean isModelType(java.lang.String neType)
neType - NE type
true iff there is a model-based tagger for this typepublic static boolean hasModelType(java.lang.String[] neTypes)
neTypes - NE types
true iff there is a model-based tagger for one of
these typespublic static boolean allModelType(java.lang.String[] neTypes)
neTypes - NE types
true iff there is a model-based tagger for each of
these typespublic static int getFuzzyMatchingThreshold()
public static void setFuzzyMatchingThreshold(int threshold)
threshold - the new value for the edit distance threshold for
fuzzy-lookups in dictionaries
private static void addNames(java.lang.String tag,
java.util.List names,
opennlp.tools.parser.Parse[] tokens)
tag - named entity typenames - spans of tokens that are named entitiestokens - parses for the tokens
private static void extractNesRec(opennlp.tools.parser.Parse parse,
java.util.ArrayList<java.lang.String>[] nes)
extractNes(Parse) to extract NEs
from a parse tree augmented with NE tags.
parse - a node of a parse treenes - NEs found so farpublic static java.lang.String[] tokenize(java.lang.String text)
text - text to tokenize
public static java.lang.String tokenizeWithSpaces(java.lang.String text)
text - text to tokenize
public static java.lang.String[] tagNes(java.lang.String[] sentences)
sentences - array of sentences
public static void tagNes(opennlp.tools.parser.Parse[] parses)
parses - array of full parses of sentencespublic static java.lang.String[][][] extractNes(java.lang.String[][] sentences)
sentences - array of tokenized sentences
public static java.lang.String[][] extractNes(java.lang.String[][] sentences,
int neId)
sentences - array of tokenized sentencesneId - ID of a name finder or regular expression
null, if
the ID is invalidpublic static java.lang.String[][] extractNes(opennlp.tools.parser.Parse parse)
parse - a parse tree augmented with NE tags
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||