|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectinfo.ephyra.nlp.RegExMatcher
public class RegExMatcher
Applies regular expressions for named entity extraction.
| Field Summary | |
|---|---|
static java.lang.String |
ACRONYM
a regular expression capturing group matching any probable acronym (disjunctive combination of MIXED_CASE_ACRONYM and PUNCTUATED_ALL_UPPER_CASE_ACRONYM) |
static int |
ACRONYM_MAX_TOKENS
|
static java.util.regex.Pattern |
ACRONYM_PATTERN
a regular expression capturing group matching any probable acronym (disjunctive combination of MIXED_CASE_ACRONYM and PUNCTUATED_ALL_UPPER_CASE_ACRONYM) |
static java.lang.String |
ALL_UPPER_CASE_ACRONYM
a regular expression capturing group matching any sequence of two or more upper case letters (probably acronyms) |
static int |
ALL_UPPER_CASE_ACRONYM_MAX_TOKENS
|
static java.util.regex.Pattern |
ALL_UPPER_CASE_ACRONYM_PATTERN
a regular expression capturing group matching any sequence of two or more upper case letters (probably acronyms) |
static java.lang.String |
ANGLE
a regular expression capturing group matching any one angle (in particular, a number followed by 'degree' or °) |
static int |
ANGLE_MAX_TOKENS
|
static java.util.regex.Pattern |
ANGLE_PATTERN
a regular expression capturing group matching any one angle (in particular, a number followed by 'degree' or °) |
static java.lang.String |
ANGLE_UNIT
a regular expression capturing group matching any one angle unit (in particular, 'degree' or °) |
static int |
ANGLE_UNIT_MAX_TOKENS
|
static java.util.regex.Pattern |
ANGLE_UNIT_PATTERN
a regular expression capturing group matching any one angle unit (in particular, 'degree' or °) |
static java.lang.String |
AREA
a regular expression capturing group matching any one area measure (in particular, a number followed by an area unit) |
static java.lang.String |
AREA_LARGE
a regular expression capturing group matching any one larger scale area measure (in particular, a number followed by an larger scale area unit) |
static int |
AREA_MAX_TOKENS
|
static java.util.regex.Pattern |
AREA_PATTERN
a regular expression capturing group matching any one area measure (in particular, a number followed by an area unit) |
static java.lang.String |
AREA_SMALL
a regular expression capturing group matching any one smaller scale area measure (in particular, a number followed by an smaller scale area unit) |
static java.lang.String |
AREA_UNIT
a regular expression capturing group matching any one area unit (in particular, 'square (kilo|deci|cent|milli|micro|nano)meter', 'square yard', 'square mile', and 'acre') |
static int |
AREA_UNIT_MAX_TOKENS
|
static java.util.regex.Pattern |
AREA_UNIT_PATTERN
a regular expression capturing group matching any one smaller scale area unit (in particular, 'square (deci|cent|milli|micro|nano)meter', and 'square yard') |
static java.lang.String |
CENTURY
a regular expression capturing group matching any one century (like 'eighteenth century', or '21st century') |
static int |
CENTURY_MAX_TOKENS
|
static java.util.regex.Pattern |
CENTURY_PATTERN
a regular expression capturing group matching any one century (like 'eighteenth century', or '21st century') |
static java.lang.String |
CONTINUE
|
static java.lang.String |
COUNTY
a regular expression capturing group matching any county name (in particular, a proper name followed by 'County') |
static int |
COUNTY_MAX_TOKENS
|
static java.util.regex.Pattern |
COUNTY_PATTERN
a regular expression capturing group matching any county name (in particular, a proper name followed by 'County') |
static java.lang.String |
DATE
a regular expression capturing group matching any one date |
static java.lang.String |
DATE_DIGITAL
a regular expression capturing group matching any one date given in digits (for instance, '05-09-06') |
static int |
DATE_DIGITAL_MAX_TOKENS
|
static java.util.regex.Pattern |
DATE_DIGITAL_PATTERN
a regular expression capturing group matching any one date given in digits (for instance, '05-09-06') |
static java.lang.String |
DATE_FULL
a regular expression capturing group matching any one full date, optionally including the weekday (for instance, 'Tuesday, May 9th, 2006') |
static int |
DATE_FULL_MAX_TOKENS
|
static java.util.regex.Pattern |
DATE_FULL_PATTERN
a regular expression capturing group matching any one full date, optionally including the weekday (for instance, 'Thuesday, May 9th, 2006') |
static int |
DATE_MAX_TOKENS
|
static java.util.regex.Pattern |
DATE_PATTERN
a regular expression capturing group matching any one date |
static java.lang.String |
DATE_SEPARATOR
a regular expression capturing group matching any one character useually used to seperate the digit groups in a date (in particular, the characters ',', ' |
static int |
DATE_SEPARATOR_MAX_TOKENS
|
static java.util.regex.Pattern |
DATE_SEPARATOR_PATTERN
a regular expression capturing group matching any one character useually used to seperate the digit groups in a date (in particular, the characters ',', ' |
static java.lang.String |
DAY
a regular expression capturing group matching any one day date (in particular, numbers 1 through 31) |
static int |
DAY_MAX_TOKENS
|
static java.lang.String |
DAY_MONTH
a regular expression capturing group matching any one day in a month (e.g. |
static int |
DAY_MONTH_MAX_TOKENS
|
static java.util.regex.Pattern |
DAY_MONTH_PATTERN
a regular expression capturing group matching any one day in a month |
static java.util.regex.Pattern |
DAY_PATTERN
a regular expression capturing group matching any one day date (in particular, numbers 1 through 31) |
static java.lang.String |
DAYS
a regular expression capturing group matching any one duration given in days (in particular, a number followed by 'day' or 'days') |
static int |
DAYS_MAX_TOKENS
|
static java.util.regex.Pattern |
DAYS_PATTERN
a regular expression capturing group matching any one duration given in days (in particular, a number followed by a time unit) |
static java.lang.String |
DAYS_UNIT
a regular expression capturing group matching any one duration given in days (in particular, a number followed by 'day' or 'days') |
static int |
DAYS_UNIT_MAX_TOKENS
|
static java.util.regex.Pattern |
DAYS_UNIT_PATTERN
a regular expression capturing group matching any one duration given in days (in particular, a number followed by a time unit) |
static java.lang.String |
DECADE
a regular expression capturing group matching any one decade (like '80s', '1920s' or 'sixties') |
static int |
DECADE_MAX_TOKENS
|
static java.util.regex.Pattern |
DECADE_PATTERN
a regular expression capturing group matching any one decade (like '80s', '1920s' or 'sixties') |
private static java.util.HashMap<java.lang.String,HashDictionary> |
dictionariesByName
|
static java.lang.String |
DOLLAR_UNIT
a regular expression capturing group matching any one US Dollar unit (in particular, '$', 'USD', and 'Dollar') |
static int |
DOLLAR_UNIT_MAX_TOKENS
|
static java.util.regex.Pattern |
DOLLAR_UNIT_PATTERN
a regular expression capturing group matching any one US Dollar unit (in particular, '$', 'USD', and 'Dollar') |
static java.lang.String |
DURATION
a regular expression capturing group matching any one duration (in particular, a number followed by a time unit) |
static int |
DURATION_MAX_TOKENS
|
static java.util.regex.Pattern |
DURATION_PATTERN
a regular expression capturing group matching any one duration (in particular, a number followed by a time unit) |
static java.lang.String |
DURATION_UNIT
a regular expression capturing group matching any one duration (in particular, a number followed by a time unit) |
static int |
DURATION_UNIT_MAX_TOKENS
|
static java.util.regex.Pattern |
DURATION_UNIT_PATTERN
a regular expression capturing group matching any one duration unit (in particular, a number followed by a time unit) |
static java.lang.String |
EDUCATIONAL_INSTITUTION
a regular expression capturing group matching any educational institution (in particular, a proper name followed by one of 'university', 'college', 'high school', or 'elementary', or 'university of' followed by a proper name) |
static int |
EDUCATIONAL_INSTITUTION_MAX_TOKENS
|
static java.util.regex.Pattern |
EDUCATIONAL_INSTITUTION_PATTERN
a regular expression capturing group matching any educational institution (in particular, a proper name followed by one of 'University', 'College', 'High (School)', or 'Elementary (School)', or 'University of' followed by a proper name) |
static java.lang.String |
EURO_UNIT
a regular expression capturing group matching any one Euro unit (in particular, '€', 'EURO', and 'Euro') |
static int |
EURO_UNIT_MAX_TOKENS
|
static java.util.regex.Pattern |
EURO_UNIT_PATTERN
a regular expression capturing group matching any one Euro unit (in particular, '€', 'EURO', and 'Euro') |
static java.lang.String |
FEET
|
static int |
FEET_MAX_TOKENS
|
static java.util.regex.Pattern |
FEET_PATTERN
|
static java.lang.String |
FEET_UNIT
|
static int |
FEET_UNIT_MAX_TOKENS
|
static java.util.regex.Pattern |
FEET_UNIT_PATTERN
|
static java.lang.String |
FREQUENCY
a regular expression capturing group matching any one frequency (like 'once', '88 times', 'twice an hour', '25 times per day', or 'every 37 minutes') |
static int |
FREQUENCY_MAX_TOKENS
|
static java.util.regex.Pattern |
FREQUENCY_PATTERN
a regular expression capturing group matching any one frequency (like 'once', '88 times', 'twice an hour', '25 times per day', or 'every 37 minutes') |
static java.lang.String |
GALLONS
|
static int |
GALLONS_MAX_TOKENS
|
static java.util.regex.Pattern |
GALLONS_PATTERN
|
static java.lang.String |
GALLONS_UNIT
|
static int |
GALLONS_UNIT_MAX_TOKENS
|
static java.util.regex.Pattern |
GALLONS_UNIT_PATTERN
|
static java.lang.String |
GRAMS
|
static int |
GRAMS_MAX_TOKENS
|
static java.util.regex.Pattern |
GRAMS_PATTERN
|
static java.lang.String |
GRAMS_UNIT
|
static int |
GRAMS_UNIT_MAX_TOKENS
|
static java.util.regex.Pattern |
GRAMS_UNIT_PATTERN
|
static java.lang.String |
HEIGHT
a regular expression capturing group matching any one height (this constant is equal to LENGHT, it is only provided for clarity) |
static int |
HEIGHT_MAX_TOKENS
|
static java.util.regex.Pattern |
HEIGHT_PATTERN
a regular expression capturing group matching any one height (this constant is equal to LENGHT, it is only provided for clarity) |
static java.lang.String |
HEIGHT_UNIT
a regular expression capturing group matching any one smaller scale length unit (this constant is equal to LENGHT_UNIT, it is only provided for clarity) |
static int |
HEIGHT_UNIT_MAX_TOKENS
|
static java.util.regex.Pattern |
HEIGHT_UNIT_PATTERN
a regular expression capturing group matching any one smaller scale length unit (this constant is equal to LENGHT_UNIT, it is only provided for clarity) |
static java.lang.String |
LARGE_AREA_UNIT
a regular expression capturing group matching any one larger scale area unit (in particular, 'square (kilo)meter', 'square mile', and 'acre') |
static int |
LARGE_AREA_UNIT_MAX_TOKENS
|
static java.util.regex.Pattern |
LARGE_AREA_UNIT_PATTERN
a regular expression capturing group matching any one larger scale area unit (in particular, 'square (kilo)meter', 'square mile', and 'acre') |
static java.lang.String |
LEGAL_SENTENCE
a regular expression capturing group matching any legal sentence ('not guilty' and 'guilty') |
static int |
LEGAL_SENTENCE_MAX_TOKENS
|
static java.util.regex.Pattern |
LEGAL_SENTENCE_PATTERN
a regular expression capturing group matching any legal sentence ('not guilty' and 'guilty') |
static java.lang.String |
LENGTH
a regular expression capturing group matching any one length (in particular, a number followed by a length unit) |
static int |
LENGTH_MAX_TOKENS
|
static java.util.regex.Pattern |
LENGTH_PATTERN
a regular expression capturing group matching any one length (in particular, a number followed by a length unit) |
static java.lang.String |
LENGTH_UNIT
a regular expression capturing group matching any one length unit (in particular, '(kilo|deci|cent|milli|micro|nano)meter', 'foot', 'inch', and 'yard') |
static int |
LENGTH_UNIT_MAX_TOKENS
|
static java.util.regex.Pattern |
LENGTH_UNIT_PATTERN
a regular expression capturing group matching any one length unit (in particular, '(kilo|deci|cent|milli|micro|nano)meter', 'foot', 'inch', and 'yard') |
static java.lang.String |
LITERS
|
static int |
LITERS_MAX_TOKENS
|
static java.util.regex.Pattern |
LITERS_PATTERN
|
static java.lang.String |
LITERS_UNIT
|
static int |
LITERS_UNIT_MAX_TOKENS
|
static java.util.regex.Pattern |
LITERS_UNIT_PATTERN
|
private static int |
MAX_TOKENS
|
static java.lang.String |
MILES
|
static int |
MILES_MAX_TOKENS
|
static java.util.regex.Pattern |
MILES_PATTERN
|
static java.lang.String |
MILES_UNIT
|
static int |
MILES_UNIT_MAX_TOKENS
|
static java.util.regex.Pattern |
MILES_UNIT_PATTERN
|
static java.lang.String |
MIXED_CASE_ACRONYM
a regular expression capturing group matching any sequence of two or more upper case letters with intermediate or tailing lower case ones (probably acronyms) |
static int |
MIXED_CASE_ACRONYM_MAX_TOKENS
|
static java.util.regex.Pattern |
MIXED_CASE_ACRONYM_PATTERN
a regular expression capturing group matching any sequence of two or more upper case letters with intermediate or tailing lower case ones (probably acronyms) |
static java.lang.String |
MONEY
a regular expression capturing group matching any one monetary amount (in particular, a number followed by a monetary unit) |
static int |
MONEY_MAX_TOKENS
|
static java.util.regex.Pattern |
MONEY_PATTERN
a regular expression capturing group matching any one monetary amount (in particular, a number followed by a monetary unit) |
static java.lang.String |
MONEY_UNIT
a regular expression capturing group matching any one monetary unit (in particular, DOLLAR_UNIT, EURO_UNIT, 'YEN', and 'Yen') |
static int |
MONEY_UNIT_MAX_TOKENS
|
static java.util.regex.Pattern |
MONEY_UNIT_PATTERN
a regular expression capturing group matching any one monetary unit (in particular, DOLLAR_UNIT, EURO_UNIT, 'YEN', and 'Yen') |
static java.lang.String |
MONTH
a regular expression capturing group matching any one month date (in particular, numbers 1 through 12) |
static int |
MONTH_MAX_TOKENS
|
static java.lang.String |
MONTH_NAME
a regular expression capturing group matching any one month name |
static int |
MONTH_NAME_MAX_TOKENS
|
static java.util.regex.Pattern |
MONTH_NAME_PATTERN
a regular expression capturing group matching any one month name |
static java.util.regex.Pattern |
MONTH_PATTERN
a regular expression capturing group matching any one month date (in particular, numbers 1 through 12) |
static java.lang.String |
MPH
|
static int |
MPH_MAX_TOKENS
|
static java.util.regex.Pattern |
MPH_PATTERN
|
static java.lang.String |
MPH_UNIT
|
static int |
MPH_UNIT_MAX_TOKENS
|
static java.util.regex.Pattern |
MPH_UNIT_PATTERN
|
static java.lang.String |
MULTI_SCORE
a regular expression capturing group matching any probable multi score from sports events (like tennis, '6:3, 6:1, 6:7, 7:5') |
static int |
MULTI_SCORE_MAX_TOKENS
|
static java.util.regex.Pattern |
MULTI_SCORE_PATTERN
a regular expression capturing group matching any probable multi score from sports events (like tennis, '6:3, 6:1, 6:7, 7:5') |
static java.lang.String |
NUMBER
a regular expression capturing group matching all cardinal numbers given in form of digits |
static java.lang.String |
NUMBER_HUNDRED
a regular expression capturing group matching all three digit cardinal numbers given in form of words whose last two digits are zero |
static int |
NUMBER_HUNDRED_MAX_TOKENS
|
static java.util.regex.Pattern |
NUMBER_HUNDRED_PATTERN
a regular expression capturing group matching all three digit cardinal numbers given in form of words whose last two digits are zero |
static java.lang.String |
NUMBER_HUNDRED_WITH_Xteen
a regular expression capturing group matching all three digit cardinal numbers given in form of words whose last two digits are zero, and all four digit cardinal numbers given in form of words whose first digit is 1 and whose last two digits are zero |
static int |
NUMBER_HUNDRED_WITH_Xteen_MAX_TOKENS
|
static java.util.regex.Pattern |
NUMBER_HUNDRED_WITH_Xteen_PATTERN
a regular expression capturing group matching all three digit cardinal numbers given in form of words whose last two digits are zero, and all four digit cardinal numbers given in form of words whose first digit is 1 and whose last two digits are zero |
static int |
NUMBER_MAX_TOKENS
|
static java.lang.String |
NUMBER_ONE
a regular expression capturing group matching all one digit cardinal numbers given in form of words |
static int |
NUMBER_ONE_MAX_TOKENS
|
static java.util.regex.Pattern |
NUMBER_ONE_PATTERN
a regular expression capturing group matching all one digit cardinal numbers given in form of words |
static java.util.regex.Pattern |
NUMBER_PATTERN
a regular expression capturing group matching all cardinal numbers given in form of digits |
static java.lang.String |
NUMBER_TEN
a regular expression capturing group matching all two digit cardinal numbers given in form of words |
static int |
NUMBER_TEN_MAX_TOKENS
|
static java.util.regex.Pattern |
NUMBER_TEN_PATTERN
a regular expression capturing group matching all two digit cardinal numbers given in form of words |
static java.lang.String |
NUMBER_THOUSAND
a regular expression capturing group matching all four digit cardinal numbers given in form of words whose last three digits are zero |
static int |
NUMBER_THOUSAND_MAX_TOKENS
|
static java.util.regex.Pattern |
NUMBER_THOUSAND_PATTERN
a regular expression capturing group matching all four digit cardinal numbers given in form of words whose last three digits are zero |
static java.lang.String |
NUMBER_TO_HUNDRED
a regular expression capturing group matching all three digit cardinal numbers given in form of words |
static int |
NUMBER_TO_HUNDRED_MAX_TOKENS
|
static java.util.regex.Pattern |
NUMBER_TO_HUNDRED_PATTERN
a regular expression capturing group matching all three digit cardinal numbers given in form of words |
static java.lang.String |
NUMBER_TO_THOUSAND
a regular expression capturing group matching all up to six digit cardinal numbers given in form of words |
static int |
NUMBER_TO_THOUSAND_MAX_TOKENS
|
static java.util.regex.Pattern |
NUMBER_TO_THOUSAND_PATTERN
a regular expression capturing group matching all up to six digit cardinal numbers given in form of words |
static java.lang.String |
NUMBER_Xillion
a regular expression capturing group matching all cardinal numbers involving 'million', 'billion' or 'trillion' |
static int |
NUMBER_Xillion_MAX_TOKENS
|
static java.util.regex.Pattern |
NUMBER_Xillion_PATTERN
a regular expression capturing group matching all cardinal numbers involving 'million', 'billion' or 'trillion' |
static java.lang.String |
NUMBER_Xteen
a regular expression capturing group matching all two digit cardinal numbers given in form of words whose first digit is one |
static int |
NUMBER_Xteen_MAX_TOKENS
|
static java.util.regex.Pattern |
NUMBER_Xteen_PATTERN
a regular expression capturing group matching all two digit cardinal numbers given in form of words whose first digit is one |
static java.lang.String |
ORDINAL
a regular expression capturing group matching all ordinal numbers given in form of digits |
static java.lang.String |
ORDINAL_DAY
a regular expression capturing group matching any one ordinal number representing a day date (in particular, numbers 1st through 31st) |
static java.lang.String |
ORDINAL_HUNDRED
a regular expression capturing group matching all three digit ordinal numbers given in form of words whose last two digits are zero |
static int |
ORDINAL_HUNDRED_MAX_TOKENS
|
static java.util.regex.Pattern |
ORDINAL_HUNDRED_PATTERN
a regular expression capturing group matching all three digit ordinal numbers given in form of words whose last two digits are zero |
static java.lang.String |
ORDINAL_HUNDRED_WITH_Xteen
a regular expression capturing group matching all three digit ordinal numbers given in form of words whose last two digits are zero, and all four digit cardinal numbers given in form of words whose first digit is 1 and whose last two digits are zero |
static int |
ORDINAL_HUNDRED_WITH_Xteen_MAX_TOKENS
|
static java.util.regex.Pattern |
ORDINAL_HUNDRED_WITH_Xteen_PATTERN
a regular expression capturing group matching all three digit ordinal numbers given in form of words whose last two digits are zero, and all four digit cardinal numbers given in form of words whose first digit is 1 and whose last two digits are zero |
static int |
ORDINAL_MAX_TOKENS
|
static java.lang.String |
ORDINAL_ONE
a regular expression capturing group matching all one digit ordinal numbers given in form of words |
static int |
ORDINAL_ONE_MAX_TOKENS
|
static java.util.regex.Pattern |
ORDINAL_ONE_PATTERN
a regular expression capturing group matching all one digit ordinal numbers given in form of words |
static java.util.regex.Pattern |
ORDINAL_PATTERN
a regular expression capturing group matching all ordinal numbers given in form of digits |
static java.lang.String |
ORDINAL_TEN
a regular expression capturing group matching all two digit ordinal numbers given in form of words |
static int |
ORDINAL_TEN_MAX_TOKENS
|
static java.util.regex.Pattern |
ORDINAL_TEN_PATTERN
a regular expression capturing group matching all two digit ordinal numbers given in form of words |
static java.lang.String |
ORDINAL_THOUSAND
a regular expression capturing group matching all four digit ordinal numbers given in form of words whose last three digits are zero |
static int |
ORDINAL_THOUSAND_MAX_TOKENS
|
static java.util.regex.Pattern |
ORDINAL_THOUSAND_PATTERN
a regular expression capturing group matching all four digit ordinal numbers given in form of words whose last three digits are zero |
static java.lang.String |
ORDINAL_TO_HUNDRED
a regular expression capturing group matching all three digit ordinal numbers given in form of words |
static int |
ORDINAL_TO_HUNDRED_MAX_TOKENS
|
static java.util.regex.Pattern |
ORDINAL_TO_HUNDRED_PATTERN
a regular expression capturing group matching all three digit ordinal numbers given in form of words |
static java.lang.String |
ORDINAL_TO_THOUSAND
a regular expression capturing group matching all up to six digit ordinal numbers given in form of words |
static int |
ORDINAL_TO_THOUSAND_MAX_TOKENS
|
static java.util.regex.Pattern |
ORDINAL_TO_THOUSAND_PATTERN
a regular expression capturing group matching all up to six digit ordinal numbers given in form of words |
static java.lang.String |
ORDINAL_Xteen
a regular expression capturing group matching all two digit ordinal numbers given in form of words whose first digit is one |
static int |
ORDINAL_Xteen_MAX_TOKENS
|
static java.util.regex.Pattern |
ORDINAL_Xteen_PATTERN
a regular expression capturing group matching all two digit ordinal numbers given in form of words whose first digit is one |
static java.lang.String |
OTHER
|
static java.lang.String |
OUNCES
|
static int |
OUNCES_MAX_TOKENS
|
static java.util.regex.Pattern |
OUNCES_PATTERN
|
static java.lang.String |
OUNCES_UNIT
|
static int |
OUNCES_UNIT_MAX_TOKENS
|
static java.util.regex.Pattern |
OUNCES_UNIT_PATTERN
|
static boolean |
PATTERNS_CASE_INSENSITIVE
case sensitivity default for regular expression patterns |
static java.lang.String |
PERCENTAGE
a regular expression capturing group matching any one percentage (like 'once', '88 times', 'twice an hour', '25 times per day', or 'every 37 minutes') |
static int |
PERCENTAGE_MAX_TOKENS
|
static java.util.regex.Pattern |
PERCENTAGE_PATTERN
a regular expression capturing group matching any one percentage (like '20 out of hundered', '34 percent', or '100 %') |
static java.lang.String |
PHONE_NUMBER
|
static int |
PHONE_NUMBER_MAX_TOKENS
|
static java.util.regex.Pattern |
PHONE_NUMBER_PATTERN
|
static java.lang.String |
POUNDS
|
static int |
POUNDS_MAX_TOKENS
|
static java.util.regex.Pattern |
POUNDS_PATTERN
|
static java.lang.String |
POUNDS_UNIT
|
static int |
POUNDS_UNIT_MAX_TOKENS
|
static java.util.regex.Pattern |
POUNDS_UNIT_PATTERN
|
static java.lang.String |
PROPER_NAME
a regular expression capturing group matching any proper name (sequence of capiatlized words, with only 'of the', 'of', 'the', 'and' allowed in lower case) |
static int |
PROPER_NAME_MAX_TOKENS
|
static java.util.regex.Pattern |
PROPER_NAME_PATTERN
a regular expression capturing group matching any proper name (sequence of capiatlized words, with only 'of the', 'of', 'the', 'and' allowed in lower case) |
static java.lang.String |
PUNCTUATED_ALL_UPPER_CASE_ACRONYM
a regular expression capturing group matching any sequence of two or more upper case letters, intermixed with punctuation marks like dots and ampersands (probably acronyms) |
static int |
PUNCTUATED_ALL_UPPER_CASE_ACRONYM_MAX_TOKENS
|
static java.util.regex.Pattern |
PUNCTUATED_ALL_UPPER_CASE_ACRONYM_PATTERN
a regular expression capturing group matching any sequence of two or more upper case letters, intermixed with punctuation marks like dots and ampersands (probably acronyms) |
static java.lang.String |
RANGE
|
static int |
RANGE_MAX_TOKENS
|
static java.util.regex.Pattern |
RANGE_PATTERN
|
static java.lang.String |
RANGE_UNIT
|
static int |
RANGE_UNIT_MAX_TOKENS
|
static java.util.regex.Pattern |
RANGE_UNIT_PATTERN
|
static java.lang.String |
RATE
|
static int |
RATE_MAX_TOKENS
|
static java.util.regex.Pattern |
RATE_PATTERN
|
static java.lang.String |
REEF
a regular expression capturing group matching any reef name (in particular, a proper name followed by 'Reef') |
static int |
REEF_MAX_TOKENS
|
static java.util.regex.Pattern |
REEF_PATTERN
a regular expression capturing group matching any reef name (in particular, a proper name followed by 'Reef') |
static java.lang.String |
SCORE
a regular expression capturing group matching any probable single or multi score from sports events (like soccer or tennis, '5:3' or '6:3, 6:1, 6:7, 7:5') |
static int |
SCORE_MAX_TOKENS
|
static java.util.regex.Pattern |
SCORE_PATTERN
a regular expression capturing group matching any probable single or multi score from sports events (like soccer or tennis, '5:3' or '6:3, 6:1, 6:7, 7:5') |
static java.lang.String |
SINGLE_SCORE
a regular expression capturing group matching any probable single score from sports events (like soccer or basketball scores, '5:3' or '89:109') |
static int |
SINGLE_SCORE_MAX_TOKENS
|
static java.util.regex.Pattern |
SINGLE_SCORE_PATTERN
a regular expression capturing group matching any probable single score from sports events (like soccer or basketball scores, '5:3' or '89:109') |
static java.lang.String |
SIZE
a regular expression capturing group matching any one size measure (in particular, a number followed by a length, area or volume unit) |
static int |
SIZE_MAX_TOKENS
|
static java.util.regex.Pattern |
SIZE_PATTERN
a regular expression capturing group matching any one size measure (in particular, a number followed by a length, area or volume unit) |
static java.lang.String |
SIZE_UNIT
a regular expression capturing group matching any one size measure (in particular, a number followed by a length, area or volume unit) |
static int |
SIZE_UNIT_MAX_TOKENS
|
static java.util.regex.Pattern |
SIZE_UNIT_PATTERN
a regular expression capturing group matching any one size measure (in particular, a number followed by a length, area or volume unit) |
static java.lang.String |
SMALL_AREA_UNIT
a regular expression capturing group matching any one smaller scale area unit (in particular, 'square (deci|cent|milli|micro|nano)meter', and 'square yard') |
static int |
SMALL_AREA_UNIT_MAX_TOKENS
|
static java.util.regex.Pattern |
SMALL_AREA_UNIT_PATTERN
a regular expression capturing group matching any one smaller scale area unit (in particular, 'square (deci|cent|milli|micro|nano)meter', and 'square yard') |
static java.lang.String |
SPEED
a regular expression capturing group matching any one speed (in particular, a number followed by a speed unit) |
static int |
SPEED_MAX_TOKENS
|
static java.util.regex.Pattern |
SPEED_PATTERN
a regular expression capturing group matching any one speed (in particular, a number followed by a speed unit) |
static java.lang.String |
SPEED_UNIT
a regular expression capturing group matching any one speed unit (in particular, a length or distance unit devided by a time unit) |
static int |
SPEED_UNIT_MAX_TOKENS
|
static java.util.regex.Pattern |
SPEED_UNIT_PATTERN
a regular expression capturing group matching any one speed unit (in particular, a length or distance unit devided by a time unit) |
static java.lang.String |
SQUARE_MILES
|
static int |
SQUARE_MILES_MAX_TOKENS
|
static java.util.regex.Pattern |
SQUARE_MILES_PATTERN
|
static java.lang.String |
SQUARE_MILES_UNIT
|
static int |
SQUARE_MILES_UNIT_MAX_TOKENS
|
static java.util.regex.Pattern |
SQUARE_MILES_UNIT_PATTERN
|
static java.lang.String |
START
|
static java.lang.String |
STREET
a regular expression capturing group matching any street (in particular, a proper name followed by a synonym of 'street', like 'Madison Avenue', 'Capitol Beltway', 'Burbon Street', etc) |
static int |
STREET_MAX_TOKENS
|
static java.util.regex.Pattern |
STREET_PATTERN
a regular expression capturing group matching any street (in particular, a proper name followed by a synonym of 'street', like 'Madison Avenue', 'Capitol Beltway', 'Burbon Street', etc) |
static java.lang.String |
TEMPERATURE
a regular expression capturing group matching any one temperature (in particular, a number followed by a temperature unit) |
static int |
TEMPERATURE_MAX_TOKENS
|
static java.util.regex.Pattern |
TEMPERATURE_PATTERN
a regular expression capturing group matching any one temperature (in particular, a number followed by a temperature unit) |
static java.lang.String |
TEMPERATURE_UNIT
a regular expression capturing group matching any one temperature unit (in particular, a length or distance unit devided by a time unit) |
static int |
TEMPERATURE_UNIT_MAX_TOKENS
|
static java.util.regex.Pattern |
TEMPERATURE_UNIT_PATTERN
a regular expression capturing group matching any one temperature unit (in particular, a length or distance unit devided by a time unit) |
static java.lang.String |
TIME
a regular expression capturing group matching any one time (like '5:30 pm' or 'dawn') |
static int |
TIME_MAX_TOKENS
|
static java.util.regex.Pattern |
TIME_PATTERN
a regular expression capturing group matching any one time (like '5:30 pm' or 'dawn') |
static java.lang.String |
TIME_UNIT
a regular expression capturing group matching any one time unit (in particular, 'hour', 'minute', '(milli|micro|nano)second', 'day', 'week', 'month', 'year', 'decade', and 'century') |
static int |
TIME_UNIT_MAX_TOKENS
|
static java.util.regex.Pattern |
TIME_UNIT_PATTERN
a regular expression capturing group matching any one time unit (in particular, 'hour', 'minute', '(milli|micro|nano)second', 'day', 'week', 'month', 'year', 'decade', and 'century') |
static java.lang.String |
TONS
|
static int |
TONS_MAX_TOKENS
|
static java.util.regex.Pattern |
TONS_PATTERN
|
static java.lang.String |
TONS_UNIT
|
static int |
TONS_UNIT_MAX_TOKENS
|
static java.util.regex.Pattern |
TONS_UNIT_PATTERN
|
static java.lang.String |
URL
a regular expression capturing group matching any URL (like 'http://www.uni-karlsruhe.de:8080/res/index.html') |
static java.lang.String |
URL_AUTHORITY
a regular expression capturing group matching the authority part of any URL (like 'www.uni-karlsruhe.de') |
static java.lang.String |
URL_FILE
a regular expression capturing group matching the path part of any URL (like '/res/index.html') |
static int |
URL_MAX_TOKENS
|
static java.util.regex.Pattern |
URL_PATTERN
a regular expression capturing group matching any URL (like 'http://www.uni-karlsruhe.de:8080/res/index.html') |
static java.lang.String |
URL_PORT
a regular expression capturing group matching the port part of any URL (like '8080') |
static java.lang.String |
URL_PROTOCOL
a regular expression capturing group matching the protocol part of any URL (like 'http://') |
static java.lang.String |
VOLUME
a regular expression capturing group matching any one area measure (in particular, a number followed by a volume unit) |
static int |
VOLUME_MAX_TOKENS
|
static java.util.regex.Pattern |
VOLUME_PATTERN
a regular expression capturing group matching any one area measure (in particular, a number followed by a volume unit) |
static java.lang.String |
VOLUME_UNIT
a regular expression capturing group matching any one volume unit |
static int |
VOLUME_UNIT_MAX_TOKENS
|
static java.util.regex.Pattern |
VOLUME_UNIT_PATTERN
a regular expression capturing group matching any one volume unit |
static java.lang.String |
WEEKDAY
a regular expression capturing group matching any one weekday |
static int |
WEEKDAY_MAX_TOKENS
|
static java.util.regex.Pattern |
WEEKDAY_PATTERN
a regular expression capturing group matching any one weekday |
static java.lang.String |
WEIGHT
a regular expression capturing group matching any one area measure (in particular, a number followed by a volume unit) |
static int |
WEIGHT_MAX_TOKENS
|
static java.util.regex.Pattern |
WEIGHT_PATTERN
a regular expression capturing group matching any one area measure (in particular, a number followed by a volume unit) |
static java.lang.String |
WEIGHT_UNIT
a regular expression capturing group matching any one volume unit |
static int |
WEIGHT_UNIT_MAX_TOKENS
|
static java.util.regex.Pattern |
WEIGHT_UNIT_PATTERN
a regular expression capturing group matching any one volume unit |
static java.lang.String |
YEAR
a regular expression capturing group matching any one year date (in particular, numbers 1 through 2999 and their two digit counterparts, the latter optionally preceded by a single quote) |
static int |
YEAR_MAX_TOKENS
|
static java.util.regex.Pattern |
YEAR_PATTERN
a regular expression capturing group matching any one year date (in particular, numbers 1 through 2999 and their two digit counterparts, the latter optionally preceded by a single quote) |
static java.lang.String |
YEARS
a regular expression capturing group matching any one duration given in years (in particular, a number followed by 'year' or 'years') |
static int |
YEARS_MAX_TOKENS
|
static java.util.regex.Pattern |
YEARS_PATTERN
a regular expression capturing group matching any one duration (in particular, a number followed by a time unit) |
static java.lang.String |
YEARS_UNIT
a regular expression capturing group matching any one duration given in years (in particular, a number followed by 'year' or 'years') |
static int |
YEARS_UNIT_MAX_TOKENS
|
static java.util.regex.Pattern |
YEARS_UNIT_PATTERN
a regular expression capturing group matching any one duration (in particular, a number followed by a time unit) |
static java.lang.String |
ZIPCODE
|
static int |
ZIPCODE_MAX_TOKENS
|
static java.util.regex.Pattern |
ZIPCODE_PATTERN
|
| Constructor Summary | |
|---|---|
RegExMatcher()
|
|
| Method Summary | |
|---|---|
static java.util.regex.Pattern |
compile(java.lang.String regEx)
create a Pattern from a regular expression String, using default case sensitivity |
static java.util.regex.Pattern |
compile(java.lang.String regEx,
boolean caseSensitive)
create a Pattern from a regular expression String |
static java.lang.String[] |
extractAllContained(java.lang.String[] tokens,
HashDictionary dictionary)
mark all parts of a String that are contained in a list of Strings |
static java.lang.String[] |
extractAllContained(java.lang.String[] tokens,
HashDictionary dictionary,
int threshold)
mark all parts of a String that are fuyy-contained in a list of Strings |
static java.lang.String[] |
extractAllMatches(java.lang.String text,
java.util.regex.Pattern pattern)
extract all parts from a token sequence that match a regular expression |
static java.lang.String[] |
extractAllMatches(java.lang.String text,
java.lang.String regEx)
extract all parts from a token sequence that match a regular expression |
static java.lang.String[] |
extractNumbers(java.lang.String[] tokens)
mark all numbers in a token sequence |
static java.lang.String[] |
extractOrdinalNumbers(java.lang.String[] tokens)
mark all numbers in a token sequence |
static java.lang.String[] |
extractQuantities(java.lang.String[] tokens,
java.lang.String[] numberMarkers,
java.util.regex.Pattern dimensionPattern,
int maxTokens)
mark all parts from a token sequence that match a regular expression |
static HashDictionary |
getDictionary(java.lang.String name)
load a gazetteer |
static java.lang.String[] |
markAllContained(java.lang.String[] tokens,
HashDictionary dictionary)
mark all parts of a String that are contained in a list of Strings |
static java.lang.String[] |
markAllContained(java.lang.String[] tokens,
HashDictionary dictionary,
int threshold)
mark all parts of a String that are fuzzy-contained in a list of Strings |
static java.lang.String[] |
markAllMatches(java.lang.String[] tokens,
java.util.regex.Pattern pattern)
mark all parts of a token sequence that match a regular expression |
static java.lang.String[] |
markAllMatches(java.lang.String[] tokens,
java.util.regex.Pattern pattern,
int maxTokens)
mark all parts of a token sequence that match a regular expression |
static java.lang.String[] |
markAllMatches(java.lang.String[] tokens,
java.lang.String regEx)
mark all parts of a token sequence that match a regular expression |
static java.lang.String[] |
markAllMatches(java.lang.String[] tokens,
java.lang.String regEx,
int maxTokens)
mark all parts of a token sequence that match a regular expression |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final java.lang.String OTHER
public static final java.lang.String START
public static final java.lang.String CONTINUE
private static final int MAX_TOKENS
private static java.util.HashMap<java.lang.String,HashDictionary> dictionariesByName
public static final java.lang.String NUMBER
public static final java.lang.String ORDINAL
public static final java.lang.String NUMBER_ONE
public static final java.lang.String NUMBER_Xteen
public static final java.lang.String NUMBER_TEN
public static final java.lang.String NUMBER_HUNDRED
public static final java.lang.String NUMBER_HUNDRED_WITH_Xteen
public static final java.lang.String NUMBER_TO_HUNDRED
public static final java.lang.String NUMBER_THOUSAND
public static final java.lang.String NUMBER_TO_THOUSAND
public static final java.lang.String NUMBER_Xillion
public static final java.lang.String ORDINAL_ONE
public static final java.lang.String ORDINAL_Xteen
public static final java.lang.String ORDINAL_TEN
public static final java.lang.String ORDINAL_HUNDRED
public static final java.lang.String ORDINAL_HUNDRED_WITH_Xteen
public static final java.lang.String ORDINAL_TO_HUNDRED
public static final java.lang.String ORDINAL_THOUSAND
public static final java.lang.String ORDINAL_TO_THOUSAND
public static final java.lang.String LENGTH_UNIT
public static final java.lang.String HEIGHT_UNIT
public static final java.lang.String LENGTH
public static final java.lang.String HEIGHT
public static final java.lang.String DOLLAR_UNIT
public static final java.lang.String EURO_UNIT
public static final java.lang.String MONEY_UNIT
public static final java.lang.String MONEY
public static final java.lang.String TIME_UNIT
public static final java.lang.String TIME
public static final java.lang.String DURATION_UNIT
public static final java.lang.String DURATION
public static final java.lang.String DAYS_UNIT
public static final java.lang.String DAYS
public static final java.lang.String YEARS_UNIT
public static final java.lang.String YEARS
public static final java.lang.String FREQUENCY
public static final java.lang.String PERCENTAGE
public static final java.lang.String LARGE_AREA_UNIT
public static final java.lang.String SMALL_AREA_UNIT
public static final java.lang.String AREA_UNIT
public static final java.lang.String AREA_LARGE
public static final java.lang.String AREA_SMALL
public static final java.lang.String AREA
public static final java.lang.String VOLUME_UNIT
public static final java.lang.String VOLUME
public static final java.lang.String SIZE_UNIT
public static final java.lang.String SIZE
public static final java.lang.String WEIGHT_UNIT
public static final java.lang.String WEIGHT
public static final java.lang.String SPEED_UNIT
public static final java.lang.String SPEED
public static final java.lang.String TEMPERATURE_UNIT
public static final java.lang.String TEMPERATURE
public static final java.lang.String ANGLE_UNIT
public static final java.lang.String ANGLE
public static final java.lang.String MONTH_NAME
public static final java.lang.String WEEKDAY
public static final java.lang.String DAY
public static final java.lang.String ORDINAL_DAY
public static final java.lang.String MONTH
public static final java.lang.String YEAR
public static final java.lang.String DATE_SEPARATOR
public static final java.lang.String DATE_FULL
public static final java.lang.String DATE_DIGITAL
public static final java.lang.String DATE
public static final java.lang.String DAY_MONTH
public static final java.lang.String DECADE
public static final java.lang.String CENTURY
public static final java.lang.String ALL_UPPER_CASE_ACRONYM
public static final java.lang.String PUNCTUATED_ALL_UPPER_CASE_ACRONYM
public static final java.lang.String MIXED_CASE_ACRONYM
public static final java.lang.String ACRONYM
public static final java.lang.String SINGLE_SCORE
public static final java.lang.String MULTI_SCORE
public static final java.lang.String SCORE
public static final java.lang.String URL_PROTOCOL
public static final java.lang.String URL_AUTHORITY
public static final java.lang.String URL_PORT
public static final java.lang.String URL_FILE
public static final java.lang.String URL
public static final java.lang.String LEGAL_SENTENCE
public static final java.lang.String PROPER_NAME
public static final java.lang.String STREET
public static final java.lang.String COUNTY
public static final java.lang.String REEF
public static final java.lang.String EDUCATIONAL_INSTITUTION
public static final boolean PATTERNS_CASE_INSENSITIVE
public static final java.util.regex.Pattern NUMBER_PATTERN
public static final int NUMBER_MAX_TOKENS
public static final java.util.regex.Pattern ORDINAL_PATTERN
public static final int ORDINAL_MAX_TOKENS
public static final java.util.regex.Pattern NUMBER_ONE_PATTERN
public static final int NUMBER_ONE_MAX_TOKENS
public static final java.util.regex.Pattern NUMBER_Xteen_PATTERN
public static final int NUMBER_Xteen_MAX_TOKENS
public static final java.util.regex.Pattern NUMBER_TEN_PATTERN
public static final int NUMBER_TEN_MAX_TOKENS
public static final java.util.regex.Pattern NUMBER_HUNDRED_PATTERN
public static final int NUMBER_HUNDRED_MAX_TOKENS
public static final java.util.regex.Pattern NUMBER_HUNDRED_WITH_Xteen_PATTERN
public static final int NUMBER_HUNDRED_WITH_Xteen_MAX_TOKENS
public static final java.util.regex.Pattern NUMBER_TO_HUNDRED_PATTERN
public static final int NUMBER_TO_HUNDRED_MAX_TOKENS
public static final java.util.regex.Pattern NUMBER_THOUSAND_PATTERN
public static final int NUMBER_THOUSAND_MAX_TOKENS
public static final java.util.regex.Pattern NUMBER_TO_THOUSAND_PATTERN
public static final int NUMBER_TO_THOUSAND_MAX_TOKENS
public static final java.util.regex.Pattern NUMBER_Xillion_PATTERN
public static final int NUMBER_Xillion_MAX_TOKENS
public static final java.util.regex.Pattern ORDINAL_ONE_PATTERN
public static final int ORDINAL_ONE_MAX_TOKENS
public static final java.util.regex.Pattern ORDINAL_Xteen_PATTERN
public static final int ORDINAL_Xteen_MAX_TOKENS
public static final java.util.regex.Pattern ORDINAL_TEN_PATTERN
public static final int ORDINAL_TEN_MAX_TOKENS
public static final java.util.regex.Pattern ORDINAL_HUNDRED_PATTERN
public static final int ORDINAL_HUNDRED_MAX_TOKENS
public static final java.util.regex.Pattern ORDINAL_HUNDRED_WITH_Xteen_PATTERN
public static final int ORDINAL_HUNDRED_WITH_Xteen_MAX_TOKENS
public static final java.util.regex.Pattern ORDINAL_TO_HUNDRED_PATTERN
public static final int ORDINAL_TO_HUNDRED_MAX_TOKENS
public static final java.util.regex.Pattern ORDINAL_THOUSAND_PATTERN
public static final int ORDINAL_THOUSAND_MAX_TOKENS
public static final java.util.regex.Pattern ORDINAL_TO_THOUSAND_PATTERN
public static final int ORDINAL_TO_THOUSAND_MAX_TOKENS
public static final java.util.regex.Pattern LENGTH_UNIT_PATTERN
public static final int LENGTH_UNIT_MAX_TOKENS
public static final java.util.regex.Pattern HEIGHT_UNIT_PATTERN
public static final int HEIGHT_UNIT_MAX_TOKENS
public static final java.util.regex.Pattern LENGTH_PATTERN
public static final int LENGTH_MAX_TOKENS
public static final java.util.regex.Pattern HEIGHT_PATTERN
public static final int HEIGHT_MAX_TOKENS
public static final java.util.regex.Pattern DOLLAR_UNIT_PATTERN
public static final int DOLLAR_UNIT_MAX_TOKENS
public static final java.util.regex.Pattern EURO_UNIT_PATTERN
public static final int EURO_UNIT_MAX_TOKENS
public static final java.util.regex.Pattern MONEY_UNIT_PATTERN
public static final int MONEY_UNIT_MAX_TOKENS
public static final java.util.regex.Pattern MONEY_PATTERN
public static final int MONEY_MAX_TOKENS
public static final java.util.regex.Pattern TIME_UNIT_PATTERN
public static final int TIME_UNIT_MAX_TOKENS
public static final java.util.regex.Pattern TIME_PATTERN
public static final int TIME_MAX_TOKENS
public static final java.util.regex.Pattern DURATION_UNIT_PATTERN
public static final int DURATION_UNIT_MAX_TOKENS
public static final java.util.regex.Pattern DURATION_PATTERN
public static final int DURATION_MAX_TOKENS
public static final java.util.regex.Pattern DAYS_UNIT_PATTERN
public static final int DAYS_UNIT_MAX_TOKENS
public static final java.util.regex.Pattern DAYS_PATTERN
public static final int DAYS_MAX_TOKENS
public static final java.util.regex.Pattern YEARS_UNIT_PATTERN
public static final int YEARS_UNIT_MAX_TOKENS
public static final java.util.regex.Pattern YEARS_PATTERN
public static final int YEARS_MAX_TOKENS
public static final java.util.regex.Pattern FREQUENCY_PATTERN
public static final int FREQUENCY_MAX_TOKENS
public static final java.util.regex.Pattern PERCENTAGE_PATTERN
public static final int PERCENTAGE_MAX_TOKENS
public static final java.util.regex.Pattern LARGE_AREA_UNIT_PATTERN
public static final int LARGE_AREA_UNIT_MAX_TOKENS
public static final java.util.regex.Pattern SMALL_AREA_UNIT_PATTERN
public static final int SMALL_AREA_UNIT_MAX_TOKENS
public static final java.util.regex.Pattern AREA_UNIT_PATTERN
public static final int AREA_UNIT_MAX_TOKENS
public static final java.util.regex.Pattern AREA_PATTERN
public static final int AREA_MAX_TOKENS
public static final java.util.regex.Pattern VOLUME_UNIT_PATTERN
public static final int VOLUME_UNIT_MAX_TOKENS
public static final java.util.regex.Pattern VOLUME_PATTERN
public static final int VOLUME_MAX_TOKENS
public static final java.util.regex.Pattern SIZE_UNIT_PATTERN
public static final int SIZE_UNIT_MAX_TOKENS
public static final java.util.regex.Pattern SIZE_PATTERN
public static final int SIZE_MAX_TOKENS
public static final java.util.regex.Pattern WEIGHT_UNIT_PATTERN
public static final int WEIGHT_UNIT_MAX_TOKENS
public static final java.util.regex.Pattern WEIGHT_PATTERN
public static final int WEIGHT_MAX_TOKENS
public static final java.util.regex.Pattern SPEED_UNIT_PATTERN
public static final int SPEED_UNIT_MAX_TOKENS
public static final java.util.regex.Pattern SPEED_PATTERN
public static final int SPEED_MAX_TOKENS
public static final java.util.regex.Pattern TEMPERATURE_UNIT_PATTERN
public static final int TEMPERATURE_UNIT_MAX_TOKENS
public static final java.util.regex.Pattern TEMPERATURE_PATTERN
public static final int TEMPERATURE_MAX_TOKENS
public static final java.util.regex.Pattern ANGLE_UNIT_PATTERN
public static final int ANGLE_UNIT_MAX_TOKENS
public static final java.util.regex.Pattern ANGLE_PATTERN
public static final int ANGLE_MAX_TOKENS
public static final java.util.regex.Pattern MONTH_NAME_PATTERN
public static final int MONTH_NAME_MAX_TOKENS
public static final java.util.regex.Pattern WEEKDAY_PATTERN
public static final int WEEKDAY_MAX_TOKENS
public static final java.util.regex.Pattern DAY_PATTERN
public static final int DAY_MAX_TOKENS
public static final java.util.regex.Pattern MONTH_PATTERN
public static final int MONTH_MAX_TOKENS
public static final java.util.regex.Pattern YEAR_PATTERN
public static final int YEAR_MAX_TOKENS
public static final java.util.regex.Pattern DATE_SEPARATOR_PATTERN
public static final int DATE_SEPARATOR_MAX_TOKENS
public static final java.util.regex.Pattern DATE_FULL_PATTERN
public static final int DATE_FULL_MAX_TOKENS
public static final java.util.regex.Pattern DATE_DIGITAL_PATTERN
public static final int DATE_DIGITAL_MAX_TOKENS
public static final java.util.regex.Pattern DATE_PATTERN
public static final int DATE_MAX_TOKENS
public static final java.util.regex.Pattern DAY_MONTH_PATTERN
public static final int DAY_MONTH_MAX_TOKENS
public static final java.util.regex.Pattern DECADE_PATTERN
public static final int DECADE_MAX_TOKENS
public static final java.util.regex.Pattern CENTURY_PATTERN
public static final int CENTURY_MAX_TOKENS
public static final java.util.regex.Pattern ALL_UPPER_CASE_ACRONYM_PATTERN
public static final int ALL_UPPER_CASE_ACRONYM_MAX_TOKENS
public static final java.util.regex.Pattern PUNCTUATED_ALL_UPPER_CASE_ACRONYM_PATTERN
public static final int PUNCTUATED_ALL_UPPER_CASE_ACRONYM_MAX_TOKENS
public static final java.util.regex.Pattern MIXED_CASE_ACRONYM_PATTERN
public static final int MIXED_CASE_ACRONYM_MAX_TOKENS
public static final java.util.regex.Pattern ACRONYM_PATTERN
public static final int ACRONYM_MAX_TOKENS
public static final java.util.regex.Pattern SINGLE_SCORE_PATTERN
public static final int SINGLE_SCORE_MAX_TOKENS
public static final java.util.regex.Pattern MULTI_SCORE_PATTERN
public static final int MULTI_SCORE_MAX_TOKENS
public static final java.util.regex.Pattern SCORE_PATTERN
public static final int SCORE_MAX_TOKENS
public static final java.util.regex.Pattern URL_PATTERN
public static final int URL_MAX_TOKENS
public static final java.util.regex.Pattern LEGAL_SENTENCE_PATTERN
public static final int LEGAL_SENTENCE_MAX_TOKENS
public static final java.util.regex.Pattern PROPER_NAME_PATTERN
public static final int PROPER_NAME_MAX_TOKENS
public static final java.util.regex.Pattern STREET_PATTERN
public static final int STREET_MAX_TOKENS
public static final java.util.regex.Pattern COUNTY_PATTERN
public static final int COUNTY_MAX_TOKENS
public static final java.util.regex.Pattern REEF_PATTERN
public static final int REEF_MAX_TOKENS
public static final java.util.regex.Pattern EDUCATIONAL_INSTITUTION_PATTERN
public static final int EDUCATIONAL_INSTITUTION_MAX_TOKENS
public static final java.lang.String FEET_UNIT
public static final int FEET_UNIT_MAX_TOKENS
public static final java.util.regex.Pattern FEET_UNIT_PATTERN
public static final java.lang.String FEET
public static final int FEET_MAX_TOKENS
public static final java.util.regex.Pattern FEET_PATTERN
public static final java.lang.String GALLONS_UNIT
public static final int GALLONS_UNIT_MAX_TOKENS
public static final java.util.regex.Pattern GALLONS_UNIT_PATTERN
public static final java.lang.String GALLONS
public static final int GALLONS_MAX_TOKENS
public static final java.util.regex.Pattern GALLONS_PATTERN
public static final java.lang.String GRAMS_UNIT
public static final int GRAMS_UNIT_MAX_TOKENS
public static final java.util.regex.Pattern GRAMS_UNIT_PATTERN
public static final java.lang.String GRAMS
public static final int GRAMS_MAX_TOKENS
public static final java.util.regex.Pattern GRAMS_PATTERN
public static final java.lang.String LITERS_UNIT
public static final int LITERS_UNIT_MAX_TOKENS
public static final java.util.regex.Pattern LITERS_UNIT_PATTERN
public static final java.lang.String LITERS
public static final int LITERS_MAX_TOKENS
public static final java.util.regex.Pattern LITERS_PATTERN
public static final java.lang.String MILES_UNIT
public static final int MILES_UNIT_MAX_TOKENS
public static final java.util.regex.Pattern MILES_UNIT_PATTERN
public static final java.lang.String MILES
public static final int MILES_MAX_TOKENS
public static final java.util.regex.Pattern MILES_PATTERN
public static final java.lang.String MPH_UNIT
public static final int MPH_UNIT_MAX_TOKENS
public static final java.util.regex.Pattern MPH_UNIT_PATTERN
public static final java.lang.String MPH
public static final int MPH_MAX_TOKENS
public static final java.util.regex.Pattern MPH_PATTERN
public static final java.lang.String OUNCES_UNIT
public static final int OUNCES_UNIT_MAX_TOKENS
public static final java.util.regex.Pattern OUNCES_UNIT_PATTERN
public static final java.lang.String OUNCES
public static final int OUNCES_MAX_TOKENS
public static final java.util.regex.Pattern OUNCES_PATTERN
public static final java.lang.String POUNDS_UNIT
public static final int POUNDS_UNIT_MAX_TOKENS
public static final java.util.regex.Pattern POUNDS_UNIT_PATTERN
public static final java.lang.String POUNDS
public static final int POUNDS_MAX_TOKENS
public static final java.util.regex.Pattern POUNDS_PATTERN
public static final java.lang.String RANGE_UNIT
public static final int RANGE_UNIT_MAX_TOKENS
public static final java.util.regex.Pattern RANGE_UNIT_PATTERN
public static final java.lang.String RANGE
public static final int RANGE_MAX_TOKENS
public static final java.util.regex.Pattern RANGE_PATTERN
public static final java.lang.String RATE
public static final int RATE_MAX_TOKENS
public static final java.util.regex.Pattern RATE_PATTERN
public static final java.lang.String SQUARE_MILES_UNIT
public static final int SQUARE_MILES_UNIT_MAX_TOKENS
public static final java.util.regex.Pattern SQUARE_MILES_UNIT_PATTERN
public static final java.lang.String SQUARE_MILES
public static final int SQUARE_MILES_MAX_TOKENS
public static final java.util.regex.Pattern SQUARE_MILES_PATTERN
public static final java.lang.String TONS_UNIT
public static final int TONS_UNIT_MAX_TOKENS
public static final java.util.regex.Pattern TONS_UNIT_PATTERN
public static final java.lang.String TONS
public static final int TONS_MAX_TOKENS
public static final java.util.regex.Pattern TONS_PATTERN
public static final java.lang.String ZIPCODE
public static final int ZIPCODE_MAX_TOKENS
public static final java.util.regex.Pattern ZIPCODE_PATTERN
public static final java.lang.String PHONE_NUMBER
public static final int PHONE_NUMBER_MAX_TOKENS
public static final java.util.regex.Pattern PHONE_NUMBER_PATTERN
| Constructor Detail |
|---|
public RegExMatcher()
| Method Detail |
|---|
public static java.lang.String[] markAllMatches(java.lang.String[] tokens,
java.lang.String regEx)
tokens - the token sequence to be rooted throughregEx - the regular expression that's matches are to be extracted
public static java.lang.String[] markAllMatches(java.lang.String[] tokens,
java.util.regex.Pattern pattern)
tokens - the token sequence to be rooted throughpattern - the regular expression that's matches are to be extracted
public static java.lang.String[] markAllMatches(java.lang.String[] tokens,
java.lang.String regEx,
int maxTokens)
tokens - the token sequence to be rooted throughregEx - the regular expression that's matches are to be extractedmaxTokens - the maximum number of tokens a matching part may contain (0 means no limit, Attention: high computation effort)
public static java.lang.String[] markAllMatches(java.lang.String[] tokens,
java.util.regex.Pattern pattern,
int maxTokens)
tokens - the token sequence to be rooted throughpattern - the pattern that's matches are to be extractedmaxTokens - the maximum number of tokens a matching part may contain (0 means no limit, Attention: high computation effort)
public static java.lang.String[] extractAllMatches(java.lang.String text,
java.lang.String regEx)
text - the token sequence to be rooted throughregEx - the regular expression that's matches are to be extracted
public static java.lang.String[] extractAllMatches(java.lang.String text,
java.util.regex.Pattern pattern)
text - the token sequence to be rooted throughpattern - the regular expression Pattern that's matches are to be extracted
public static HashDictionary getDictionary(java.lang.String name)
name - the name of the list to be loaded
public static java.lang.String[] markAllContained(java.lang.String[] tokens,
HashDictionary dictionary)
tokens - the token sequence to be rooted throughdictionary - the gazetteer containing the Strings to be found
public static java.lang.String[] markAllContained(java.lang.String[] tokens,
HashDictionary dictionary,
int threshold)
tokens - the token sequence to be rooted throughdictionary - the gazetteer containing the Strings to be foundthreshold - the maximum editing distance for which a fuzzy lookup shall return true
public static java.lang.String[] extractAllContained(java.lang.String[] tokens,
HashDictionary dictionary)
tokens - the token sequence to be rooted throughdictionary - the gazetteer containing the Strings to be found
public static java.lang.String[] extractAllContained(java.lang.String[] tokens,
HashDictionary dictionary,
int threshold)
tokens - the token sequence to be rooted throughdictionary - the gazetteer containing the Strings to be foundthreshold - the maximum editing distance for which a fuzzy lookup shall return true
public static java.lang.String[] extractNumbers(java.lang.String[] tokens)
tokens - the token sequence
public static java.lang.String[] extractQuantities(java.lang.String[] tokens,
java.lang.String[] numberMarkers,
java.util.regex.Pattern dimensionPattern,
int maxTokens)
tokens - the token sequence to be rooted throughdimensionPattern - the pattern that's matches are to be extractedmaxTokens - the maximum number of tokens a matching part may contain (0 means no limit, Attention: high computation effort)
public static java.lang.String[] extractOrdinalNumbers(java.lang.String[] tokens)
tokens - the token sequence
public static java.util.regex.Pattern compile(java.lang.String regEx)
regEx - the regular expression String
public static java.util.regex.Pattern compile(java.lang.String regEx,
boolean caseSensitive)
regEx - the regular expression StringcaseSensitive - create a case sensitive Pattern or not?
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||