info.ephyra.nlp
Class RegExMatcher

java.lang.Object
  extended by info.ephyra.nlp.RegExMatcher

public class RegExMatcher
extends java.lang.Object

Applies regular expressions for named entity extraction.

Version:
2008-02-10
Author:
Guido Sautter, Nico Schlaefer

Field Summary
static java.lang.String ACRONYM
          a regular expression capturing group matching any probable acronym (disjunctive combination of MIXED_CASE_ACRONYM and PUNCTUATED_ALL_UPPER_CASE_ACRONYM)
static int ACRONYM_MAX_TOKENS
           
static java.util.regex.Pattern ACRONYM_PATTERN
          a regular expression capturing group matching any probable acronym (disjunctive combination of MIXED_CASE_ACRONYM and PUNCTUATED_ALL_UPPER_CASE_ACRONYM)
static java.lang.String ALL_UPPER_CASE_ACRONYM
          a regular expression capturing group matching any sequence of two or more upper case letters (probably acronyms)
static int ALL_UPPER_CASE_ACRONYM_MAX_TOKENS
           
static java.util.regex.Pattern ALL_UPPER_CASE_ACRONYM_PATTERN
          a regular expression capturing group matching any sequence of two or more upper case letters (probably acronyms)
static java.lang.String ANGLE
          a regular expression capturing group matching any one angle (in particular, a number followed by 'degree' or °)
static int ANGLE_MAX_TOKENS
           
static java.util.regex.Pattern ANGLE_PATTERN
          a regular expression capturing group matching any one angle (in particular, a number followed by 'degree' or °)
static java.lang.String ANGLE_UNIT
          a regular expression capturing group matching any one angle unit (in particular, 'degree' or °)
static int ANGLE_UNIT_MAX_TOKENS
           
static java.util.regex.Pattern ANGLE_UNIT_PATTERN
          a regular expression capturing group matching any one angle unit (in particular, 'degree' or °)
static java.lang.String AREA
          a regular expression capturing group matching any one area measure (in particular, a number followed by an area unit)
static java.lang.String AREA_LARGE
          a regular expression capturing group matching any one larger scale area measure (in particular, a number followed by an larger scale area unit)
static int AREA_MAX_TOKENS
           
static java.util.regex.Pattern AREA_PATTERN
          a regular expression capturing group matching any one area measure (in particular, a number followed by an area unit)
static java.lang.String AREA_SMALL
          a regular expression capturing group matching any one smaller scale area measure (in particular, a number followed by an smaller scale area unit)
static java.lang.String AREA_UNIT
          a regular expression capturing group matching any one area unit (in particular, 'square (kilo|deci|cent|milli|micro|nano)meter', 'square yard', 'square mile', and 'acre')
static int AREA_UNIT_MAX_TOKENS
           
static java.util.regex.Pattern AREA_UNIT_PATTERN
          a regular expression capturing group matching any one smaller scale area unit (in particular, 'square (deci|cent|milli|micro|nano)meter', and 'square yard')
static java.lang.String CENTURY
          a regular expression capturing group matching any one century (like 'eighteenth century', or '21st century')
static int CENTURY_MAX_TOKENS
           
static java.util.regex.Pattern CENTURY_PATTERN
          a regular expression capturing group matching any one century (like 'eighteenth century', or '21st century')
static java.lang.String CONTINUE
           
static java.lang.String COUNTY
          a regular expression capturing group matching any county name (in particular, a proper name followed by 'County')
static int COUNTY_MAX_TOKENS
           
static java.util.regex.Pattern COUNTY_PATTERN
          a regular expression capturing group matching any county name (in particular, a proper name followed by 'County')
static java.lang.String DATE
          a regular expression capturing group matching any one date
static java.lang.String DATE_DIGITAL
          a regular expression capturing group matching any one date given in digits (for instance, '05-09-06')
static int DATE_DIGITAL_MAX_TOKENS
           
static java.util.regex.Pattern DATE_DIGITAL_PATTERN
          a regular expression capturing group matching any one date given in digits (for instance, '05-09-06')
static java.lang.String DATE_FULL
          a regular expression capturing group matching any one full date, optionally including the weekday (for instance, 'Tuesday, May 9th, 2006')
static int DATE_FULL_MAX_TOKENS
           
static java.util.regex.Pattern DATE_FULL_PATTERN
          a regular expression capturing group matching any one full date, optionally including the weekday (for instance, 'Thuesday, May 9th, 2006')
static int DATE_MAX_TOKENS
           
static java.util.regex.Pattern DATE_PATTERN
          a regular expression capturing group matching any one date
static java.lang.String DATE_SEPARATOR
          a regular expression capturing group matching any one character useually used to seperate the digit groups in a date (in particular, the characters ',', '
static int DATE_SEPARATOR_MAX_TOKENS
           
static java.util.regex.Pattern DATE_SEPARATOR_PATTERN
          a regular expression capturing group matching any one character useually used to seperate the digit groups in a date (in particular, the characters ',', '
static java.lang.String DAY
          a regular expression capturing group matching any one day date (in particular, numbers 1 through 31)
static int DAY_MAX_TOKENS
           
static java.lang.String DAY_MONTH
          a regular expression capturing group matching any one day in a month (e.g.
static int DAY_MONTH_MAX_TOKENS
           
static java.util.regex.Pattern DAY_MONTH_PATTERN
          a regular expression capturing group matching any one day in a month
static java.util.regex.Pattern DAY_PATTERN
          a regular expression capturing group matching any one day date (in particular, numbers 1 through 31)
static java.lang.String DAYS
          a regular expression capturing group matching any one duration given in days (in particular, a number followed by 'day' or 'days')
static int DAYS_MAX_TOKENS
           
static java.util.regex.Pattern DAYS_PATTERN
          a regular expression capturing group matching any one duration given in days (in particular, a number followed by a time unit)
static java.lang.String DAYS_UNIT
          a regular expression capturing group matching any one duration given in days (in particular, a number followed by 'day' or 'days')
static int DAYS_UNIT_MAX_TOKENS
           
static java.util.regex.Pattern DAYS_UNIT_PATTERN
          a regular expression capturing group matching any one duration given in days (in particular, a number followed by a time unit)
static java.lang.String DECADE
          a regular expression capturing group matching any one decade (like '80s', '1920s' or 'sixties')
static int DECADE_MAX_TOKENS
           
static java.util.regex.Pattern DECADE_PATTERN
          a regular expression capturing group matching any one decade (like '80s', '1920s' or 'sixties')
private static java.util.HashMap<java.lang.String,HashDictionary> dictionariesByName
           
static java.lang.String DOLLAR_UNIT
          a regular expression capturing group matching any one US Dollar unit (in particular, '$', 'USD', and 'Dollar')
static int DOLLAR_UNIT_MAX_TOKENS
           
static java.util.regex.Pattern DOLLAR_UNIT_PATTERN
          a regular expression capturing group matching any one US Dollar unit (in particular, '$', 'USD', and 'Dollar')
static java.lang.String DURATION
          a regular expression capturing group matching any one duration (in particular, a number followed by a time unit)
static int DURATION_MAX_TOKENS
           
static java.util.regex.Pattern DURATION_PATTERN
          a regular expression capturing group matching any one duration (in particular, a number followed by a time unit)
static java.lang.String DURATION_UNIT
          a regular expression capturing group matching any one duration (in particular, a number followed by a time unit)
static int DURATION_UNIT_MAX_TOKENS
           
static java.util.regex.Pattern DURATION_UNIT_PATTERN
          a regular expression capturing group matching any one duration unit (in particular, a number followed by a time unit)
static java.lang.String EDUCATIONAL_INSTITUTION
          a regular expression capturing group matching any educational institution (in particular, a proper name followed by one of 'university', 'college', 'high school', or 'elementary', or 'university of' followed by a proper name)
static int EDUCATIONAL_INSTITUTION_MAX_TOKENS
           
static java.util.regex.Pattern EDUCATIONAL_INSTITUTION_PATTERN
          a regular expression capturing group matching any educational institution (in particular, a proper name followed by one of 'University', 'College', 'High (School)', or 'Elementary (School)', or 'University of' followed by a proper name)
static java.lang.String EURO_UNIT
          a regular expression capturing group matching any one Euro unit (in particular, '€', 'EURO', and 'Euro')
static int EURO_UNIT_MAX_TOKENS
           
static java.util.regex.Pattern EURO_UNIT_PATTERN
          a regular expression capturing group matching any one Euro unit (in particular, '€', 'EURO', and 'Euro')
static java.lang.String FEET
           
static int FEET_MAX_TOKENS
           
static java.util.regex.Pattern FEET_PATTERN
           
static java.lang.String FEET_UNIT
           
static int FEET_UNIT_MAX_TOKENS
           
static java.util.regex.Pattern FEET_UNIT_PATTERN
           
static java.lang.String FREQUENCY
          a regular expression capturing group matching any one frequency (like 'once', '88 times', 'twice an hour', '25 times per day', or 'every 37 minutes')
static int FREQUENCY_MAX_TOKENS
           
static java.util.regex.Pattern FREQUENCY_PATTERN
          a regular expression capturing group matching any one frequency (like 'once', '88 times', 'twice an hour', '25 times per day', or 'every 37 minutes')
static java.lang.String GALLONS
           
static int GALLONS_MAX_TOKENS
           
static java.util.regex.Pattern GALLONS_PATTERN
           
static java.lang.String GALLONS_UNIT
           
static int GALLONS_UNIT_MAX_TOKENS
           
static java.util.regex.Pattern GALLONS_UNIT_PATTERN
           
static java.lang.String GRAMS
           
static int GRAMS_MAX_TOKENS
           
static java.util.regex.Pattern GRAMS_PATTERN
           
static java.lang.String GRAMS_UNIT
           
static int GRAMS_UNIT_MAX_TOKENS
           
static java.util.regex.Pattern GRAMS_UNIT_PATTERN
           
static java.lang.String HEIGHT
          a regular expression capturing group matching any one height (this constant is equal to LENGHT, it is only provided for clarity)
static int HEIGHT_MAX_TOKENS
           
static java.util.regex.Pattern HEIGHT_PATTERN
          a regular expression capturing group matching any one height (this constant is equal to LENGHT, it is only provided for clarity)
static java.lang.String HEIGHT_UNIT
          a regular expression capturing group matching any one smaller scale length unit (this constant is equal to LENGHT_UNIT, it is only provided for clarity)
static int HEIGHT_UNIT_MAX_TOKENS
           
static java.util.regex.Pattern HEIGHT_UNIT_PATTERN
          a regular expression capturing group matching any one smaller scale length unit (this constant is equal to LENGHT_UNIT, it is only provided for clarity)
static java.lang.String LARGE_AREA_UNIT
          a regular expression capturing group matching any one larger scale area unit (in particular, 'square (kilo)meter', 'square mile', and 'acre')
static int LARGE_AREA_UNIT_MAX_TOKENS
           
static java.util.regex.Pattern LARGE_AREA_UNIT_PATTERN
          a regular expression capturing group matching any one larger scale area unit (in particular, 'square (kilo)meter', 'square mile', and 'acre')
static java.lang.String LEGAL_SENTENCE
          a regular expression capturing group matching any legal sentence ('not guilty' and 'guilty')
static int LEGAL_SENTENCE_MAX_TOKENS
           
static java.util.regex.Pattern LEGAL_SENTENCE_PATTERN
          a regular expression capturing group matching any legal sentence ('not guilty' and 'guilty')
static java.lang.String LENGTH
          a regular expression capturing group matching any one length (in particular, a number followed by a length unit)
static int LENGTH_MAX_TOKENS
           
static java.util.regex.Pattern LENGTH_PATTERN
          a regular expression capturing group matching any one length (in particular, a number followed by a length unit)
static java.lang.String LENGTH_UNIT
          a regular expression capturing group matching any one length unit (in particular, '(kilo|deci|cent|milli|micro|nano)meter', 'foot', 'inch', and 'yard')
static int LENGTH_UNIT_MAX_TOKENS
           
static java.util.regex.Pattern LENGTH_UNIT_PATTERN
          a regular expression capturing group matching any one length unit (in particular, '(kilo|deci|cent|milli|micro|nano)meter', 'foot', 'inch', and 'yard')
static java.lang.String LITERS
           
static int LITERS_MAX_TOKENS
           
static java.util.regex.Pattern LITERS_PATTERN
           
static java.lang.String LITERS_UNIT
           
static int LITERS_UNIT_MAX_TOKENS
           
static java.util.regex.Pattern LITERS_UNIT_PATTERN
           
private static int MAX_TOKENS
           
static java.lang.String MILES
           
static int MILES_MAX_TOKENS
           
static java.util.regex.Pattern MILES_PATTERN
           
static java.lang.String MILES_UNIT
           
static int MILES_UNIT_MAX_TOKENS
           
static java.util.regex.Pattern MILES_UNIT_PATTERN
           
static java.lang.String MIXED_CASE_ACRONYM
          a regular expression capturing group matching any sequence of two or more upper case letters with intermediate or tailing lower case ones (probably acronyms)
static int MIXED_CASE_ACRONYM_MAX_TOKENS
           
static java.util.regex.Pattern MIXED_CASE_ACRONYM_PATTERN
          a regular expression capturing group matching any sequence of two or more upper case letters with intermediate or tailing lower case ones (probably acronyms)
static java.lang.String MONEY
          a regular expression capturing group matching any one monetary amount (in particular, a number followed by a monetary unit)
static int MONEY_MAX_TOKENS
           
static java.util.regex.Pattern MONEY_PATTERN
          a regular expression capturing group matching any one monetary amount (in particular, a number followed by a monetary unit)
static java.lang.String MONEY_UNIT
          a regular expression capturing group matching any one monetary unit (in particular, DOLLAR_UNIT, EURO_UNIT, 'YEN', and 'Yen')
static int MONEY_UNIT_MAX_TOKENS
           
static java.util.regex.Pattern MONEY_UNIT_PATTERN
          a regular expression capturing group matching any one monetary unit (in particular, DOLLAR_UNIT, EURO_UNIT, 'YEN', and 'Yen')
static java.lang.String MONTH
          a regular expression capturing group matching any one month date (in particular, numbers 1 through 12)
static int MONTH_MAX_TOKENS
           
static java.lang.String MONTH_NAME
          a regular expression capturing group matching any one month name
static int MONTH_NAME_MAX_TOKENS
           
static java.util.regex.Pattern MONTH_NAME_PATTERN
          a regular expression capturing group matching any one month name
static java.util.regex.Pattern MONTH_PATTERN
          a regular expression capturing group matching any one month date (in particular, numbers 1 through 12)
static java.lang.String MPH
           
static int MPH_MAX_TOKENS
           
static java.util.regex.Pattern MPH_PATTERN
           
static java.lang.String MPH_UNIT
           
static int MPH_UNIT_MAX_TOKENS
           
static java.util.regex.Pattern MPH_UNIT_PATTERN
           
static java.lang.String MULTI_SCORE
          a regular expression capturing group matching any probable multi score from sports events (like tennis, '6:3, 6:1, 6:7, 7:5')
static int MULTI_SCORE_MAX_TOKENS
           
static java.util.regex.Pattern MULTI_SCORE_PATTERN
          a regular expression capturing group matching any probable multi score from sports events (like tennis, '6:3, 6:1, 6:7, 7:5')
static java.lang.String NUMBER
          a regular expression capturing group matching all cardinal numbers given in form of digits
static java.lang.String NUMBER_HUNDRED
          a regular expression capturing group matching all three digit cardinal numbers given in form of words whose last two digits are zero
static int NUMBER_HUNDRED_MAX_TOKENS
           
static java.util.regex.Pattern NUMBER_HUNDRED_PATTERN
          a regular expression capturing group matching all three digit cardinal numbers given in form of words whose last two digits are zero
static java.lang.String NUMBER_HUNDRED_WITH_Xteen
          a regular expression capturing group matching all three digit cardinal numbers given in form of words whose last two digits are zero, and all four digit cardinal numbers given in form of words whose first digit is 1 and whose last two digits are zero
static int NUMBER_HUNDRED_WITH_Xteen_MAX_TOKENS
           
static java.util.regex.Pattern NUMBER_HUNDRED_WITH_Xteen_PATTERN
          a regular expression capturing group matching all three digit cardinal numbers given in form of words whose last two digits are zero, and all four digit cardinal numbers given in form of words whose first digit is 1 and whose last two digits are zero
static int NUMBER_MAX_TOKENS
           
static java.lang.String NUMBER_ONE
          a regular expression capturing group matching all one digit cardinal numbers given in form of words
static int NUMBER_ONE_MAX_TOKENS
           
static java.util.regex.Pattern NUMBER_ONE_PATTERN
          a regular expression capturing group matching all one digit cardinal numbers given in form of words
static java.util.regex.Pattern NUMBER_PATTERN
          a regular expression capturing group matching all cardinal numbers given in form of digits
static java.lang.String NUMBER_TEN
          a regular expression capturing group matching all two digit cardinal numbers given in form of words
static int NUMBER_TEN_MAX_TOKENS
           
static java.util.regex.Pattern NUMBER_TEN_PATTERN
          a regular expression capturing group matching all two digit cardinal numbers given in form of words
static java.lang.String NUMBER_THOUSAND
          a regular expression capturing group matching all four digit cardinal numbers given in form of words whose last three digits are zero
static int NUMBER_THOUSAND_MAX_TOKENS
           
static java.util.regex.Pattern NUMBER_THOUSAND_PATTERN
          a regular expression capturing group matching all four digit cardinal numbers given in form of words whose last three digits are zero
static java.lang.String NUMBER_TO_HUNDRED
          a regular expression capturing group matching all three digit cardinal numbers given in form of words
static int NUMBER_TO_HUNDRED_MAX_TOKENS
           
static java.util.regex.Pattern NUMBER_TO_HUNDRED_PATTERN
          a regular expression capturing group matching all three digit cardinal numbers given in form of words
static java.lang.String NUMBER_TO_THOUSAND
          a regular expression capturing group matching all up to six digit cardinal numbers given in form of words
static int NUMBER_TO_THOUSAND_MAX_TOKENS
           
static java.util.regex.Pattern NUMBER_TO_THOUSAND_PATTERN
          a regular expression capturing group matching all up to six digit cardinal numbers given in form of words
static java.lang.String NUMBER_Xillion
          a regular expression capturing group matching all cardinal numbers involving 'million', 'billion' or 'trillion'
static int NUMBER_Xillion_MAX_TOKENS
           
static java.util.regex.Pattern NUMBER_Xillion_PATTERN
          a regular expression capturing group matching all cardinal numbers involving 'million', 'billion' or 'trillion'
static java.lang.String NUMBER_Xteen
          a regular expression capturing group matching all two digit cardinal numbers given in form of words whose first digit is one
static int NUMBER_Xteen_MAX_TOKENS
           
static java.util.regex.Pattern NUMBER_Xteen_PATTERN
          a regular expression capturing group matching all two digit cardinal numbers given in form of words whose first digit is one
static java.lang.String ORDINAL
          a regular expression capturing group matching all ordinal numbers given in form of digits
static java.lang.String ORDINAL_DAY
          a regular expression capturing group matching any one ordinal number representing a day date (in particular, numbers 1st through 31st)
static java.lang.String ORDINAL_HUNDRED
          a regular expression capturing group matching all three digit ordinal numbers given in form of words whose last two digits are zero
static int ORDINAL_HUNDRED_MAX_TOKENS
           
static java.util.regex.Pattern ORDINAL_HUNDRED_PATTERN
          a regular expression capturing group matching all three digit ordinal numbers given in form of words whose last two digits are zero
static java.lang.String ORDINAL_HUNDRED_WITH_Xteen
          a regular expression capturing group matching all three digit ordinal numbers given in form of words whose last two digits are zero, and all four digit cardinal numbers given in form of words whose first digit is 1 and whose last two digits are zero
static int ORDINAL_HUNDRED_WITH_Xteen_MAX_TOKENS
           
static java.util.regex.Pattern ORDINAL_HUNDRED_WITH_Xteen_PATTERN
          a regular expression capturing group matching all three digit ordinal numbers given in form of words whose last two digits are zero, and all four digit cardinal numbers given in form of words whose first digit is 1 and whose last two digits are zero
static int ORDINAL_MAX_TOKENS
           
static java.lang.String ORDINAL_ONE
          a regular expression capturing group matching all one digit ordinal numbers given in form of words
static int ORDINAL_ONE_MAX_TOKENS
           
static java.util.regex.Pattern ORDINAL_ONE_PATTERN
          a regular expression capturing group matching all one digit ordinal numbers given in form of words
static java.util.regex.Pattern ORDINAL_PATTERN
          a regular expression capturing group matching all ordinal numbers given in form of digits
static java.lang.String ORDINAL_TEN
          a regular expression capturing group matching all two digit ordinal numbers given in form of words
static int ORDINAL_TEN_MAX_TOKENS
           
static java.util.regex.Pattern ORDINAL_TEN_PATTERN
          a regular expression capturing group matching all two digit ordinal numbers given in form of words
static java.lang.String ORDINAL_THOUSAND
          a regular expression capturing group matching all four digit ordinal numbers given in form of words whose last three digits are zero
static int ORDINAL_THOUSAND_MAX_TOKENS
           
static java.util.regex.Pattern ORDINAL_THOUSAND_PATTERN
          a regular expression capturing group matching all four digit ordinal numbers given in form of words whose last three digits are zero
static java.lang.String ORDINAL_TO_HUNDRED
          a regular expression capturing group matching all three digit ordinal numbers given in form of words
static int ORDINAL_TO_HUNDRED_MAX_TOKENS
           
static java.util.regex.Pattern ORDINAL_TO_HUNDRED_PATTERN
          a regular expression capturing group matching all three digit ordinal numbers given in form of words
static java.lang.String ORDINAL_TO_THOUSAND
          a regular expression capturing group matching all up to six digit ordinal numbers given in form of words
static int ORDINAL_TO_THOUSAND_MAX_TOKENS
           
static java.util.regex.Pattern ORDINAL_TO_THOUSAND_PATTERN
          a regular expression capturing group matching all up to six digit ordinal numbers given in form of words
static java.lang.String ORDINAL_Xteen
          a regular expression capturing group matching all two digit ordinal numbers given in form of words whose first digit is one
static int ORDINAL_Xteen_MAX_TOKENS
           
static java.util.regex.Pattern ORDINAL_Xteen_PATTERN
          a regular expression capturing group matching all two digit ordinal numbers given in form of words whose first digit is one
static java.lang.String OTHER
           
static java.lang.String OUNCES
           
static int OUNCES_MAX_TOKENS
           
static java.util.regex.Pattern OUNCES_PATTERN
           
static java.lang.String OUNCES_UNIT
           
static int OUNCES_UNIT_MAX_TOKENS
           
static java.util.regex.Pattern OUNCES_UNIT_PATTERN
           
static boolean PATTERNS_CASE_INSENSITIVE
          case sensitivity default for regular expression patterns
static java.lang.String PERCENTAGE
          a regular expression capturing group matching any one percentage (like 'once', '88 times', 'twice an hour', '25 times per day', or 'every 37 minutes')
static int PERCENTAGE_MAX_TOKENS
           
static java.util.regex.Pattern PERCENTAGE_PATTERN
          a regular expression capturing group matching any one percentage (like '20 out of hundered', '34 percent', or '100 %')
static java.lang.String PHONE_NUMBER
           
static int PHONE_NUMBER_MAX_TOKENS
           
static java.util.regex.Pattern PHONE_NUMBER_PATTERN
           
static java.lang.String POUNDS
           
static int POUNDS_MAX_TOKENS
           
static java.util.regex.Pattern POUNDS_PATTERN
           
static java.lang.String POUNDS_UNIT
           
static int POUNDS_UNIT_MAX_TOKENS
           
static java.util.regex.Pattern POUNDS_UNIT_PATTERN
           
static java.lang.String PROPER_NAME
          a regular expression capturing group matching any proper name (sequence of capiatlized words, with only 'of the', 'of', 'the', 'and' allowed in lower case)
static int PROPER_NAME_MAX_TOKENS
           
static java.util.regex.Pattern PROPER_NAME_PATTERN
          a regular expression capturing group matching any proper name (sequence of capiatlized words, with only 'of the', 'of', 'the', 'and' allowed in lower case)
static java.lang.String PUNCTUATED_ALL_UPPER_CASE_ACRONYM
          a regular expression capturing group matching any sequence of two or more upper case letters, intermixed with punctuation marks like dots and ampersands (probably acronyms)
static int PUNCTUATED_ALL_UPPER_CASE_ACRONYM_MAX_TOKENS
           
static java.util.regex.Pattern PUNCTUATED_ALL_UPPER_CASE_ACRONYM_PATTERN
          a regular expression capturing group matching any sequence of two or more upper case letters, intermixed with punctuation marks like dots and ampersands (probably acronyms)
static java.lang.String RANGE
           
static int RANGE_MAX_TOKENS
           
static java.util.regex.Pattern RANGE_PATTERN
           
static java.lang.String RANGE_UNIT
           
static int RANGE_UNIT_MAX_TOKENS
           
static java.util.regex.Pattern RANGE_UNIT_PATTERN
           
static java.lang.String RATE
           
static int RATE_MAX_TOKENS
           
static java.util.regex.Pattern RATE_PATTERN
           
static java.lang.String REEF
          a regular expression capturing group matching any reef name (in particular, a proper name followed by 'Reef')
static int REEF_MAX_TOKENS
           
static java.util.regex.Pattern REEF_PATTERN
          a regular expression capturing group matching any reef name (in particular, a proper name followed by 'Reef')
static java.lang.String SCORE
          a regular expression capturing group matching any probable single or multi score from sports events (like soccer or tennis, '5:3' or '6:3, 6:1, 6:7, 7:5')
static int SCORE_MAX_TOKENS
           
static java.util.regex.Pattern SCORE_PATTERN
          a regular expression capturing group matching any probable single or multi score from sports events (like soccer or tennis, '5:3' or '6:3, 6:1, 6:7, 7:5')
static java.lang.String SINGLE_SCORE
          a regular expression capturing group matching any probable single score from sports events (like soccer or basketball scores, '5:3' or '89:109')
static int SINGLE_SCORE_MAX_TOKENS
           
static java.util.regex.Pattern SINGLE_SCORE_PATTERN
          a regular expression capturing group matching any probable single score from sports events (like soccer or basketball scores, '5:3' or '89:109')
static java.lang.String SIZE
          a regular expression capturing group matching any one size measure (in particular, a number followed by a length, area or volume unit)
static int SIZE_MAX_TOKENS
           
static java.util.regex.Pattern SIZE_PATTERN
          a regular expression capturing group matching any one size measure (in particular, a number followed by a length, area or volume unit)
static java.lang.String SIZE_UNIT
          a regular expression capturing group matching any one size measure (in particular, a number followed by a length, area or volume unit)
static int SIZE_UNIT_MAX_TOKENS
           
static java.util.regex.Pattern SIZE_UNIT_PATTERN
          a regular expression capturing group matching any one size measure (in particular, a number followed by a length, area or volume unit)
static java.lang.String SMALL_AREA_UNIT
          a regular expression capturing group matching any one smaller scale area unit (in particular, 'square (deci|cent|milli|micro|nano)meter', and 'square yard')
static int SMALL_AREA_UNIT_MAX_TOKENS
           
static java.util.regex.Pattern SMALL_AREA_UNIT_PATTERN
          a regular expression capturing group matching any one smaller scale area unit (in particular, 'square (deci|cent|milli|micro|nano)meter', and 'square yard')
static java.lang.String SPEED
          a regular expression capturing group matching any one speed (in particular, a number followed by a speed unit)
static int SPEED_MAX_TOKENS
           
static java.util.regex.Pattern SPEED_PATTERN
          a regular expression capturing group matching any one speed (in particular, a number followed by a speed unit)
static java.lang.String SPEED_UNIT
          a regular expression capturing group matching any one speed unit (in particular, a length or distance unit devided by a time unit)
static int SPEED_UNIT_MAX_TOKENS
           
static java.util.regex.Pattern SPEED_UNIT_PATTERN
          a regular expression capturing group matching any one speed unit (in particular, a length or distance unit devided by a time unit)
static java.lang.String SQUARE_MILES
           
static int SQUARE_MILES_MAX_TOKENS
           
static java.util.regex.Pattern SQUARE_MILES_PATTERN
           
static java.lang.String SQUARE_MILES_UNIT
           
static int SQUARE_MILES_UNIT_MAX_TOKENS
           
static java.util.regex.Pattern SQUARE_MILES_UNIT_PATTERN
           
static java.lang.String START
           
static java.lang.String STREET
          a regular expression capturing group matching any street (in particular, a proper name followed by a synonym of 'street', like 'Madison Avenue', 'Capitol Beltway', 'Burbon Street', etc)
static int STREET_MAX_TOKENS
           
static java.util.regex.Pattern STREET_PATTERN
          a regular expression capturing group matching any street (in particular, a proper name followed by a synonym of 'street', like 'Madison Avenue', 'Capitol Beltway', 'Burbon Street', etc)
static java.lang.String TEMPERATURE
          a regular expression capturing group matching any one temperature (in particular, a number followed by a temperature unit)
static int TEMPERATURE_MAX_TOKENS
           
static java.util.regex.Pattern TEMPERATURE_PATTERN
          a regular expression capturing group matching any one temperature (in particular, a number followed by a temperature unit)
static java.lang.String TEMPERATURE_UNIT
          a regular expression capturing group matching any one temperature unit (in particular, a length or distance unit devided by a time unit)
static int TEMPERATURE_UNIT_MAX_TOKENS
           
static java.util.regex.Pattern TEMPERATURE_UNIT_PATTERN
          a regular expression capturing group matching any one temperature unit (in particular, a length or distance unit devided by a time unit)
static java.lang.String TIME
          a regular expression capturing group matching any one time (like '5:30 pm' or 'dawn')
static int TIME_MAX_TOKENS
           
static java.util.regex.Pattern TIME_PATTERN
          a regular expression capturing group matching any one time (like '5:30 pm' or 'dawn')
static java.lang.String TIME_UNIT
          a regular expression capturing group matching any one time unit (in particular, 'hour', 'minute', '(milli|micro|nano)second', 'day', 'week', 'month', 'year', 'decade', and 'century')
static int TIME_UNIT_MAX_TOKENS
           
static java.util.regex.Pattern TIME_UNIT_PATTERN
          a regular expression capturing group matching any one time unit (in particular, 'hour', 'minute', '(milli|micro|nano)second', 'day', 'week', 'month', 'year', 'decade', and 'century')
static java.lang.String TONS
           
static int TONS_MAX_TOKENS
           
static java.util.regex.Pattern TONS_PATTERN
           
static java.lang.String TONS_UNIT
           
static int TONS_UNIT_MAX_TOKENS
           
static java.util.regex.Pattern TONS_UNIT_PATTERN
           
static java.lang.String URL
          a regular expression capturing group matching any URL (like 'http://www.uni-karlsruhe.de:8080/res/index.html')
static java.lang.String URL_AUTHORITY
          a regular expression capturing group matching the authority part of any URL (like 'www.uni-karlsruhe.de')
static java.lang.String URL_FILE
          a regular expression capturing group matching the path part of any URL (like '/res/index.html')
static int URL_MAX_TOKENS
           
static java.util.regex.Pattern URL_PATTERN
          a regular expression capturing group matching any URL (like 'http://www.uni-karlsruhe.de:8080/res/index.html')
static java.lang.String URL_PORT
          a regular expression capturing group matching the port part of any URL (like '8080')
static java.lang.String URL_PROTOCOL
          a regular expression capturing group matching the protocol part of any URL (like 'http://')
static java.lang.String VOLUME
          a regular expression capturing group matching any one area measure (in particular, a number followed by a volume unit)
static int VOLUME_MAX_TOKENS
           
static java.util.regex.Pattern VOLUME_PATTERN
          a regular expression capturing group matching any one area measure (in particular, a number followed by a volume unit)
static java.lang.String VOLUME_UNIT
          a regular expression capturing group matching any one volume unit
static int VOLUME_UNIT_MAX_TOKENS
           
static java.util.regex.Pattern VOLUME_UNIT_PATTERN
          a regular expression capturing group matching any one volume unit
static java.lang.String WEEKDAY
          a regular expression capturing group matching any one weekday
static int WEEKDAY_MAX_TOKENS
           
static java.util.regex.Pattern WEEKDAY_PATTERN
          a regular expression capturing group matching any one weekday
static java.lang.String WEIGHT
          a regular expression capturing group matching any one area measure (in particular, a number followed by a volume unit)
static int WEIGHT_MAX_TOKENS
           
static java.util.regex.Pattern WEIGHT_PATTERN
          a regular expression capturing group matching any one area measure (in particular, a number followed by a volume unit)
static java.lang.String WEIGHT_UNIT
          a regular expression capturing group matching any one volume unit
static int WEIGHT_UNIT_MAX_TOKENS
           
static java.util.regex.Pattern WEIGHT_UNIT_PATTERN
          a regular expression capturing group matching any one volume unit
static java.lang.String YEAR
          a regular expression capturing group matching any one year date (in particular, numbers 1 through 2999 and their two digit counterparts, the latter optionally preceded by a single quote)
static int YEAR_MAX_TOKENS
           
static java.util.regex.Pattern YEAR_PATTERN
          a regular expression capturing group matching any one year date (in particular, numbers 1 through 2999 and their two digit counterparts, the latter optionally preceded by a single quote)
static java.lang.String YEARS
          a regular expression capturing group matching any one duration given in years (in particular, a number followed by 'year' or 'years')
static int YEARS_MAX_TOKENS
           
static java.util.regex.Pattern YEARS_PATTERN
          a regular expression capturing group matching any one duration (in particular, a number followed by a time unit)
static java.lang.String YEARS_UNIT
          a regular expression capturing group matching any one duration given in years (in particular, a number followed by 'year' or 'years')
static int YEARS_UNIT_MAX_TOKENS
           
static java.util.regex.Pattern YEARS_UNIT_PATTERN
          a regular expression capturing group matching any one duration (in particular, a number followed by a time unit)
static java.lang.String ZIPCODE
           
static int ZIPCODE_MAX_TOKENS
           
static java.util.regex.Pattern ZIPCODE_PATTERN
           
 
Constructor Summary
RegExMatcher()
           
 
Method Summary
static java.util.regex.Pattern compile(java.lang.String regEx)
          create a Pattern from a regular expression String, using default case sensitivity
static java.util.regex.Pattern compile(java.lang.String regEx, boolean caseSensitive)
          create a Pattern from a regular expression String
static java.lang.String[] extractAllContained(java.lang.String[] tokens, HashDictionary dictionary)
          mark all parts of a String that are contained in a list of Strings
static java.lang.String[] extractAllContained(java.lang.String[] tokens, HashDictionary dictionary, int threshold)
          mark all parts of a String that are fuyy-contained in a list of Strings
static java.lang.String[] extractAllMatches(java.lang.String text, java.util.regex.Pattern pattern)
          extract all parts from a token sequence that match a regular expression
static java.lang.String[] extractAllMatches(java.lang.String text, java.lang.String regEx)
          extract all parts from a token sequence that match a regular expression
static java.lang.String[] extractNumbers(java.lang.String[] tokens)
          mark all numbers in a token sequence
static java.lang.String[] extractOrdinalNumbers(java.lang.String[] tokens)
          mark all numbers in a token sequence
static java.lang.String[] extractQuantities(java.lang.String[] tokens, java.lang.String[] numberMarkers, java.util.regex.Pattern dimensionPattern, int maxTokens)
          mark all parts from a token sequence that match a regular expression
static HashDictionary getDictionary(java.lang.String name)
          load a gazetteer
static java.lang.String[] markAllContained(java.lang.String[] tokens, HashDictionary dictionary)
          mark all parts of a String that are contained in a list of Strings
static java.lang.String[] markAllContained(java.lang.String[] tokens, HashDictionary dictionary, int threshold)
          mark all parts of a String that are fuzzy-contained in a list of Strings
static java.lang.String[] markAllMatches(java.lang.String[] tokens, java.util.regex.Pattern pattern)
          mark all parts of a token sequence that match a regular expression
static java.lang.String[] markAllMatches(java.lang.String[] tokens, java.util.regex.Pattern pattern, int maxTokens)
          mark all parts of a token sequence that match a regular expression
static java.lang.String[] markAllMatches(java.lang.String[] tokens, java.lang.String regEx)
          mark all parts of a token sequence that match a regular expression
static java.lang.String[] markAllMatches(java.lang.String[] tokens, java.lang.String regEx, int maxTokens)
          mark all parts of a token sequence that match a regular expression
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

OTHER

public static final java.lang.String OTHER
See Also:
Constant Field Values

START

public static final java.lang.String START
See Also:
Constant Field Values

CONTINUE

public static final java.lang.String CONTINUE
See Also:
Constant Field Values

MAX_TOKENS

private static final int MAX_TOKENS
See Also:
Constant Field Values

dictionariesByName

private static java.util.HashMap<java.lang.String,HashDictionary> dictionariesByName

NUMBER

public static final java.lang.String NUMBER
a regular expression capturing group matching all cardinal numbers given in form of digits

See Also:
Constant Field Values

ORDINAL

public static final java.lang.String ORDINAL
a regular expression capturing group matching all ordinal numbers given in form of digits

See Also:
Constant Field Values

NUMBER_ONE

public static final java.lang.String NUMBER_ONE
a regular expression capturing group matching all one digit cardinal numbers given in form of words

See Also:
Constant Field Values

NUMBER_Xteen

public static final java.lang.String NUMBER_Xteen
a regular expression capturing group matching all two digit cardinal numbers given in form of words whose first digit is one

See Also:
Constant Field Values

NUMBER_TEN

public static final java.lang.String NUMBER_TEN
a regular expression capturing group matching all two digit cardinal numbers given in form of words

See Also:
Constant Field Values

NUMBER_HUNDRED

public static final java.lang.String NUMBER_HUNDRED
a regular expression capturing group matching all three digit cardinal numbers given in form of words whose last two digits are zero

See Also:
Constant Field Values

NUMBER_HUNDRED_WITH_Xteen

public static final java.lang.String NUMBER_HUNDRED_WITH_Xteen
a regular expression capturing group matching all three digit cardinal numbers given in form of words whose last two digits are zero, and all four digit cardinal numbers given in form of words whose first digit is 1 and whose last two digits are zero

See Also:
Constant Field Values

NUMBER_TO_HUNDRED

public static final java.lang.String NUMBER_TO_HUNDRED
a regular expression capturing group matching all three digit cardinal numbers given in form of words

See Also:
Constant Field Values

NUMBER_THOUSAND

public static final java.lang.String NUMBER_THOUSAND
a regular expression capturing group matching all four digit cardinal numbers given in form of words whose last three digits are zero

See Also:
Constant Field Values

NUMBER_TO_THOUSAND

public static final java.lang.String NUMBER_TO_THOUSAND
a regular expression capturing group matching all up to six digit cardinal numbers given in form of words

See Also:
Constant Field Values

NUMBER_Xillion

public static final java.lang.String NUMBER_Xillion
a regular expression capturing group matching all cardinal numbers involving 'million', 'billion' or 'trillion'

See Also:
Constant Field Values

ORDINAL_ONE

public static final java.lang.String ORDINAL_ONE
a regular expression capturing group matching all one digit ordinal numbers given in form of words

See Also:
Constant Field Values

ORDINAL_Xteen

public static final java.lang.String ORDINAL_Xteen
a regular expression capturing group matching all two digit ordinal numbers given in form of words whose first digit is one

See Also:
Constant Field Values

ORDINAL_TEN

public static final java.lang.String ORDINAL_TEN
a regular expression capturing group matching all two digit ordinal numbers given in form of words

See Also:
Constant Field Values

ORDINAL_HUNDRED

public static final java.lang.String ORDINAL_HUNDRED
a regular expression capturing group matching all three digit ordinal numbers given in form of words whose last two digits are zero

See Also:
Constant Field Values

ORDINAL_HUNDRED_WITH_Xteen

public static final java.lang.String ORDINAL_HUNDRED_WITH_Xteen
a regular expression capturing group matching all three digit ordinal numbers given in form of words whose last two digits are zero, and all four digit cardinal numbers given in form of words whose first digit is 1 and whose last two digits are zero

See Also:
Constant Field Values

ORDINAL_TO_HUNDRED

public static final java.lang.String ORDINAL_TO_HUNDRED
a regular expression capturing group matching all three digit ordinal numbers given in form of words

See Also:
Constant Field Values

ORDINAL_THOUSAND

public static final java.lang.String ORDINAL_THOUSAND
a regular expression capturing group matching all four digit ordinal numbers given in form of words whose last three digits are zero

See Also:
Constant Field Values

ORDINAL_TO_THOUSAND

public static final java.lang.String ORDINAL_TO_THOUSAND
a regular expression capturing group matching all up to six digit ordinal numbers given in form of words

See Also:
Constant Field Values

LENGTH_UNIT

public static final java.lang.String LENGTH_UNIT
a regular expression capturing group matching any one length unit (in particular, '(kilo|deci|cent|milli|micro|nano)meter', 'foot', 'inch', and 'yard')

See Also:
Constant Field Values

HEIGHT_UNIT

public static final java.lang.String HEIGHT_UNIT
a regular expression capturing group matching any one smaller scale length unit (this constant is equal to LENGHT_UNIT, it is only provided for clarity)

See Also:
Constant Field Values

LENGTH

public static final java.lang.String LENGTH
a regular expression capturing group matching any one length (in particular, a number followed by a length unit)

See Also:
Constant Field Values

HEIGHT

public static final java.lang.String HEIGHT
a regular expression capturing group matching any one height (this constant is equal to LENGHT, it is only provided for clarity)

See Also:
Constant Field Values

DOLLAR_UNIT

public static final java.lang.String DOLLAR_UNIT
a regular expression capturing group matching any one US Dollar unit (in particular, '$', 'USD', and 'Dollar')

See Also:
Constant Field Values

EURO_UNIT

public static final java.lang.String EURO_UNIT
a regular expression capturing group matching any one Euro unit (in particular, '€', 'EURO', and 'Euro')

See Also:
Constant Field Values

MONEY_UNIT

public static final java.lang.String MONEY_UNIT
a regular expression capturing group matching any one monetary unit (in particular, DOLLAR_UNIT, EURO_UNIT, 'YEN', and 'Yen')

See Also:
Constant Field Values

MONEY

public static final java.lang.String MONEY
a regular expression capturing group matching any one monetary amount (in particular, a number followed by a monetary unit)

See Also:
Constant Field Values

TIME_UNIT

public static final java.lang.String TIME_UNIT
a regular expression capturing group matching any one time unit (in particular, 'hour', 'minute', '(milli|micro|nano)second', 'day', 'week', 'month', 'year', 'decade', and 'century')

See Also:
Constant Field Values

TIME

public static final java.lang.String TIME
a regular expression capturing group matching any one time (like '5:30 pm' or 'dawn')

See Also:
Constant Field Values

DURATION_UNIT

public static final java.lang.String DURATION_UNIT
a regular expression capturing group matching any one duration (in particular, a number followed by a time unit)

See Also:
Constant Field Values

DURATION

public static final java.lang.String DURATION
a regular expression capturing group matching any one duration (in particular, a number followed by a time unit)

See Also:
Constant Field Values

DAYS_UNIT

public static final java.lang.String DAYS_UNIT
a regular expression capturing group matching any one duration given in days (in particular, a number followed by 'day' or 'days')

See Also:
Constant Field Values

DAYS

public static final java.lang.String DAYS
a regular expression capturing group matching any one duration given in days (in particular, a number followed by 'day' or 'days')

See Also:
Constant Field Values

YEARS_UNIT

public static final java.lang.String YEARS_UNIT
a regular expression capturing group matching any one duration given in years (in particular, a number followed by 'year' or 'years')

See Also:
Constant Field Values

YEARS

public static final java.lang.String YEARS
a regular expression capturing group matching any one duration given in years (in particular, a number followed by 'year' or 'years')

See Also:
Constant Field Values

FREQUENCY

public static final java.lang.String FREQUENCY
a regular expression capturing group matching any one frequency (like 'once', '88 times', 'twice an hour', '25 times per day', or 'every 37 minutes')

See Also:
Constant Field Values

PERCENTAGE

public static final java.lang.String PERCENTAGE
a regular expression capturing group matching any one percentage (like 'once', '88 times', 'twice an hour', '25 times per day', or 'every 37 minutes')

See Also:
Constant Field Values

LARGE_AREA_UNIT

public static final java.lang.String LARGE_AREA_UNIT
a regular expression capturing group matching any one larger scale area unit (in particular, 'square (kilo)meter', 'square mile', and 'acre')

See Also:
Constant Field Values

SMALL_AREA_UNIT

public static final java.lang.String SMALL_AREA_UNIT
a regular expression capturing group matching any one smaller scale area unit (in particular, 'square (deci|cent|milli|micro|nano)meter', and 'square yard')

See Also:
Constant Field Values

AREA_UNIT

public static final java.lang.String AREA_UNIT
a regular expression capturing group matching any one area unit (in particular, 'square (kilo|deci|cent|milli|micro|nano)meter', 'square yard', 'square mile', and 'acre')

See Also:
Constant Field Values

AREA_LARGE

public static final java.lang.String AREA_LARGE
a regular expression capturing group matching any one larger scale area measure (in particular, a number followed by an larger scale area unit)

See Also:
Constant Field Values

AREA_SMALL

public static final java.lang.String AREA_SMALL
a regular expression capturing group matching any one smaller scale area measure (in particular, a number followed by an smaller scale area unit)

See Also:
Constant Field Values

AREA

public static final java.lang.String AREA
a regular expression capturing group matching any one area measure (in particular, a number followed by an area unit)

See Also:
Constant Field Values

VOLUME_UNIT

public static final java.lang.String VOLUME_UNIT
a regular expression capturing group matching any one volume unit

See Also:
Constant Field Values

VOLUME

public static final java.lang.String VOLUME
a regular expression capturing group matching any one area measure (in particular, a number followed by a volume unit)

See Also:
Constant Field Values

SIZE_UNIT

public static final java.lang.String SIZE_UNIT
a regular expression capturing group matching any one size measure (in particular, a number followed by a length, area or volume unit)

See Also:
Constant Field Values

SIZE

public static final java.lang.String SIZE
a regular expression capturing group matching any one size measure (in particular, a number followed by a length, area or volume unit)

See Also:
Constant Field Values

WEIGHT_UNIT

public static final java.lang.String WEIGHT_UNIT
a regular expression capturing group matching any one volume unit

See Also:
Constant Field Values

WEIGHT

public static final java.lang.String WEIGHT
a regular expression capturing group matching any one area measure (in particular, a number followed by a volume unit)

See Also:
Constant Field Values

SPEED_UNIT

public static final java.lang.String SPEED_UNIT
a regular expression capturing group matching any one speed unit (in particular, a length or distance unit devided by a time unit)

See Also:
Constant Field Values

SPEED

public static final java.lang.String SPEED
a regular expression capturing group matching any one speed (in particular, a number followed by a speed unit)

See Also:
Constant Field Values

TEMPERATURE_UNIT

public static final java.lang.String TEMPERATURE_UNIT
a regular expression capturing group matching any one temperature unit (in particular, a length or distance unit devided by a time unit)

See Also:
Constant Field Values

TEMPERATURE

public static final java.lang.String TEMPERATURE
a regular expression capturing group matching any one temperature (in particular, a number followed by a temperature unit)

See Also:
Constant Field Values

ANGLE_UNIT

public static final java.lang.String ANGLE_UNIT
a regular expression capturing group matching any one angle unit (in particular, 'degree' or °)

See Also:
Constant Field Values

ANGLE

public static final java.lang.String ANGLE
a regular expression capturing group matching any one angle (in particular, a number followed by 'degree' or °)

See Also:
Constant Field Values

MONTH_NAME

public static final java.lang.String MONTH_NAME
a regular expression capturing group matching any one month name

See Also:
Constant Field Values

WEEKDAY

public static final java.lang.String WEEKDAY
a regular expression capturing group matching any one weekday

See Also:
Constant Field Values

DAY

public static final java.lang.String DAY
a regular expression capturing group matching any one day date (in particular, numbers 1 through 31)

See Also:
Constant Field Values

ORDINAL_DAY

public static final java.lang.String ORDINAL_DAY
a regular expression capturing group matching any one ordinal number representing a day date (in particular, numbers 1st through 31st)

See Also:
Constant Field Values

MONTH

public static final java.lang.String MONTH
a regular expression capturing group matching any one month date (in particular, numbers 1 through 12)

See Also:
Constant Field Values

YEAR

public static final java.lang.String YEAR
a regular expression capturing group matching any one year date (in particular, numbers 1 through 2999 and their two digit counterparts, the latter optionally preceded by a single quote)

See Also:
Constant Field Values

DATE_SEPARATOR

public static final java.lang.String DATE_SEPARATOR
a regular expression capturing group matching any one character useually used to seperate the digit groups in a date (in particular, the characters ',', '.', '-', '/', and '|')

See Also:
Constant Field Values

DATE_FULL

public static final java.lang.String DATE_FULL
a regular expression capturing group matching any one full date, optionally including the weekday (for instance, 'Tuesday, May 9th, 2006')

See Also:
Constant Field Values

DATE_DIGITAL

public static final java.lang.String DATE_DIGITAL
a regular expression capturing group matching any one date given in digits (for instance, '05-09-06')

See Also:
Constant Field Values

DATE

public static final java.lang.String DATE
a regular expression capturing group matching any one date

See Also:
Constant Field Values

DAY_MONTH

public static final java.lang.String DAY_MONTH
a regular expression capturing group matching any one day in a month (e.g. 'June 12' or 'May 11th')

See Also:
Constant Field Values

DECADE

public static final java.lang.String DECADE
a regular expression capturing group matching any one decade (like '80s', '1920s' or 'sixties')

See Also:
Constant Field Values

CENTURY

public static final java.lang.String CENTURY
a regular expression capturing group matching any one century (like 'eighteenth century', or '21st century')

See Also:
Constant Field Values

ALL_UPPER_CASE_ACRONYM

public static final java.lang.String ALL_UPPER_CASE_ACRONYM
a regular expression capturing group matching any sequence of two or more upper case letters (probably acronyms)

See Also:
Constant Field Values

PUNCTUATED_ALL_UPPER_CASE_ACRONYM

public static final java.lang.String PUNCTUATED_ALL_UPPER_CASE_ACRONYM
a regular expression capturing group matching any sequence of two or more upper case letters, intermixed with punctuation marks like dots and ampersands (probably acronyms)

See Also:
Constant Field Values

MIXED_CASE_ACRONYM

public static final java.lang.String MIXED_CASE_ACRONYM
a regular expression capturing group matching any sequence of two or more upper case letters with intermediate or tailing lower case ones (probably acronyms)

See Also:
Constant Field Values

ACRONYM

public static final java.lang.String ACRONYM
a regular expression capturing group matching any probable acronym (disjunctive combination of MIXED_CASE_ACRONYM and PUNCTUATED_ALL_UPPER_CASE_ACRONYM)

See Also:
Constant Field Values

SINGLE_SCORE

public static final java.lang.String SINGLE_SCORE
a regular expression capturing group matching any probable single score from sports events (like soccer or basketball scores, '5:3' or '89:109')

See Also:
Constant Field Values

MULTI_SCORE

public static final java.lang.String MULTI_SCORE
a regular expression capturing group matching any probable multi score from sports events (like tennis, '6:3, 6:1, 6:7, 7:5')

See Also:
Constant Field Values

SCORE

public static final java.lang.String SCORE
a regular expression capturing group matching any probable single or multi score from sports events (like soccer or tennis, '5:3' or '6:3, 6:1, 6:7, 7:5')

See Also:
Constant Field Values

URL_PROTOCOL

public static final java.lang.String URL_PROTOCOL
a regular expression capturing group matching the protocol part of any URL (like 'http://')

See Also:
Constant Field Values

URL_AUTHORITY

public static final java.lang.String URL_AUTHORITY
a regular expression capturing group matching the authority part of any URL (like 'www.uni-karlsruhe.de')

See Also:
Constant Field Values

URL_PORT

public static final java.lang.String URL_PORT
a regular expression capturing group matching the port part of any URL (like '8080')

See Also:
Constant Field Values

URL_FILE

public static final java.lang.String URL_FILE
a regular expression capturing group matching the path part of any URL (like '/res/index.html')

See Also:
Constant Field Values

URL

public static final java.lang.String URL
a regular expression capturing group matching any URL (like 'http://www.uni-karlsruhe.de:8080/res/index.html')

See Also:
Constant Field Values

LEGAL_SENTENCE

public static final java.lang.String LEGAL_SENTENCE
a regular expression capturing group matching any legal sentence ('not guilty' and 'guilty')

See Also:
Constant Field Values

PROPER_NAME

public static final java.lang.String PROPER_NAME
a regular expression capturing group matching any proper name (sequence of capiatlized words, with only 'of the', 'of', 'the', 'and' allowed in lower case)

See Also:
Constant Field Values

STREET

public static final java.lang.String STREET
a regular expression capturing group matching any street (in particular, a proper name followed by a synonym of 'street', like 'Madison Avenue', 'Capitol Beltway', 'Burbon Street', etc)

See Also:
Constant Field Values

COUNTY

public static final java.lang.String COUNTY
a regular expression capturing group matching any county name (in particular, a proper name followed by 'County')

See Also:
Constant Field Values

REEF

public static final java.lang.String REEF
a regular expression capturing group matching any reef name (in particular, a proper name followed by 'Reef')

See Also:
Constant Field Values

EDUCATIONAL_INSTITUTION

public static final java.lang.String EDUCATIONAL_INSTITUTION
a regular expression capturing group matching any educational institution (in particular, a proper name followed by one of 'university', 'college', 'high school', or 'elementary', or 'university of' followed by a proper name)

See Also:
Constant Field Values

PATTERNS_CASE_INSENSITIVE

public static final boolean PATTERNS_CASE_INSENSITIVE
case sensitivity default for regular expression patterns

See Also:
Constant Field Values

NUMBER_PATTERN

public static final java.util.regex.Pattern NUMBER_PATTERN
a regular expression capturing group matching all cardinal numbers given in form of digits


NUMBER_MAX_TOKENS

public static final int NUMBER_MAX_TOKENS
See Also:
Constant Field Values

ORDINAL_PATTERN

public static final java.util.regex.Pattern ORDINAL_PATTERN
a regular expression capturing group matching all ordinal numbers given in form of digits


ORDINAL_MAX_TOKENS

public static final int ORDINAL_MAX_TOKENS
See Also:
Constant Field Values

NUMBER_ONE_PATTERN

public static final java.util.regex.Pattern NUMBER_ONE_PATTERN
a regular expression capturing group matching all one digit cardinal numbers given in form of words


NUMBER_ONE_MAX_TOKENS

public static final int NUMBER_ONE_MAX_TOKENS
See Also:
Constant Field Values

NUMBER_Xteen_PATTERN

public static final java.util.regex.Pattern NUMBER_Xteen_PATTERN
a regular expression capturing group matching all two digit cardinal numbers given in form of words whose first digit is one


NUMBER_Xteen_MAX_TOKENS

public static final int NUMBER_Xteen_MAX_TOKENS
See Also:
Constant Field Values

NUMBER_TEN_PATTERN

public static final java.util.regex.Pattern NUMBER_TEN_PATTERN
a regular expression capturing group matching all two digit cardinal numbers given in form of words


NUMBER_TEN_MAX_TOKENS

public static final int NUMBER_TEN_MAX_TOKENS
See Also:
Constant Field Values

NUMBER_HUNDRED_PATTERN

public static final java.util.regex.Pattern NUMBER_HUNDRED_PATTERN
a regular expression capturing group matching all three digit cardinal numbers given in form of words whose last two digits are zero


NUMBER_HUNDRED_MAX_TOKENS

public static final int NUMBER_HUNDRED_MAX_TOKENS
See Also:
Constant Field Values

NUMBER_HUNDRED_WITH_Xteen_PATTERN

public static final java.util.regex.Pattern NUMBER_HUNDRED_WITH_Xteen_PATTERN
a regular expression capturing group matching all three digit cardinal numbers given in form of words whose last two digits are zero, and all four digit cardinal numbers given in form of words whose first digit is 1 and whose last two digits are zero


NUMBER_HUNDRED_WITH_Xteen_MAX_TOKENS

public static final int NUMBER_HUNDRED_WITH_Xteen_MAX_TOKENS
See Also:
Constant Field Values

NUMBER_TO_HUNDRED_PATTERN

public static final java.util.regex.Pattern NUMBER_TO_HUNDRED_PATTERN
a regular expression capturing group matching all three digit cardinal numbers given in form of words


NUMBER_TO_HUNDRED_MAX_TOKENS

public static final int NUMBER_TO_HUNDRED_MAX_TOKENS
See Also:
Constant Field Values

NUMBER_THOUSAND_PATTERN

public static final java.util.regex.Pattern NUMBER_THOUSAND_PATTERN
a regular expression capturing group matching all four digit cardinal numbers given in form of words whose last three digits are zero


NUMBER_THOUSAND_MAX_TOKENS

public static final int NUMBER_THOUSAND_MAX_TOKENS
See Also:
Constant Field Values

NUMBER_TO_THOUSAND_PATTERN

public static final java.util.regex.Pattern NUMBER_TO_THOUSAND_PATTERN
a regular expression capturing group matching all up to six digit cardinal numbers given in form of words


NUMBER_TO_THOUSAND_MAX_TOKENS

public static final int NUMBER_TO_THOUSAND_MAX_TOKENS
See Also:
Constant Field Values

NUMBER_Xillion_PATTERN

public static final java.util.regex.Pattern NUMBER_Xillion_PATTERN
a regular expression capturing group matching all cardinal numbers involving 'million', 'billion' or 'trillion'


NUMBER_Xillion_MAX_TOKENS

public static final int NUMBER_Xillion_MAX_TOKENS
See Also:
Constant Field Values

ORDINAL_ONE_PATTERN

public static final java.util.regex.Pattern ORDINAL_ONE_PATTERN
a regular expression capturing group matching all one digit ordinal numbers given in form of words


ORDINAL_ONE_MAX_TOKENS

public static final int ORDINAL_ONE_MAX_TOKENS
See Also:
Constant Field Values

ORDINAL_Xteen_PATTERN

public static final java.util.regex.Pattern ORDINAL_Xteen_PATTERN
a regular expression capturing group matching all two digit ordinal numbers given in form of words whose first digit is one


ORDINAL_Xteen_MAX_TOKENS

public static final int ORDINAL_Xteen_MAX_TOKENS
See Also:
Constant Field Values

ORDINAL_TEN_PATTERN

public static final java.util.regex.Pattern ORDINAL_TEN_PATTERN
a regular expression capturing group matching all two digit ordinal numbers given in form of words


ORDINAL_TEN_MAX_TOKENS

public static final int ORDINAL_TEN_MAX_TOKENS
See Also:
Constant Field Values

ORDINAL_HUNDRED_PATTERN

public static final java.util.regex.Pattern ORDINAL_HUNDRED_PATTERN
a regular expression capturing group matching all three digit ordinal numbers given in form of words whose last two digits are zero


ORDINAL_HUNDRED_MAX_TOKENS

public static final int ORDINAL_HUNDRED_MAX_TOKENS
See Also:
Constant Field Values

ORDINAL_HUNDRED_WITH_Xteen_PATTERN

public static final java.util.regex.Pattern ORDINAL_HUNDRED_WITH_Xteen_PATTERN
a regular expression capturing group matching all three digit ordinal numbers given in form of words whose last two digits are zero, and all four digit cardinal numbers given in form of words whose first digit is 1 and whose last two digits are zero


ORDINAL_HUNDRED_WITH_Xteen_MAX_TOKENS

public static final int ORDINAL_HUNDRED_WITH_Xteen_MAX_TOKENS
See Also:
Constant Field Values

ORDINAL_TO_HUNDRED_PATTERN

public static final java.util.regex.Pattern ORDINAL_TO_HUNDRED_PATTERN
a regular expression capturing group matching all three digit ordinal numbers given in form of words


ORDINAL_TO_HUNDRED_MAX_TOKENS

public static final int ORDINAL_TO_HUNDRED_MAX_TOKENS
See Also:
Constant Field Values

ORDINAL_THOUSAND_PATTERN

public static final java.util.regex.Pattern ORDINAL_THOUSAND_PATTERN
a regular expression capturing group matching all four digit ordinal numbers given in form of words whose last three digits are zero


ORDINAL_THOUSAND_MAX_TOKENS

public static final int ORDINAL_THOUSAND_MAX_TOKENS
See Also:
Constant Field Values

ORDINAL_TO_THOUSAND_PATTERN

public static final java.util.regex.Pattern ORDINAL_TO_THOUSAND_PATTERN
a regular expression capturing group matching all up to six digit ordinal numbers given in form of words


ORDINAL_TO_THOUSAND_MAX_TOKENS

public static final int ORDINAL_TO_THOUSAND_MAX_TOKENS
See Also:
Constant Field Values

LENGTH_UNIT_PATTERN

public static final java.util.regex.Pattern LENGTH_UNIT_PATTERN
a regular expression capturing group matching any one length unit (in particular, '(kilo|deci|cent|milli|micro|nano)meter', 'foot', 'inch', and 'yard')


LENGTH_UNIT_MAX_TOKENS

public static final int LENGTH_UNIT_MAX_TOKENS
See Also:
Constant Field Values

HEIGHT_UNIT_PATTERN

public static final java.util.regex.Pattern HEIGHT_UNIT_PATTERN
a regular expression capturing group matching any one smaller scale length unit (this constant is equal to LENGHT_UNIT, it is only provided for clarity)


HEIGHT_UNIT_MAX_TOKENS

public static final int HEIGHT_UNIT_MAX_TOKENS
See Also:
Constant Field Values

LENGTH_PATTERN

public static final java.util.regex.Pattern LENGTH_PATTERN
a regular expression capturing group matching any one length (in particular, a number followed by a length unit)


LENGTH_MAX_TOKENS

public static final int LENGTH_MAX_TOKENS
See Also:
Constant Field Values

HEIGHT_PATTERN

public static final java.util.regex.Pattern HEIGHT_PATTERN
a regular expression capturing group matching any one height (this constant is equal to LENGHT, it is only provided for clarity)


HEIGHT_MAX_TOKENS

public static final int HEIGHT_MAX_TOKENS
See Also:
Constant Field Values

DOLLAR_UNIT_PATTERN

public static final java.util.regex.Pattern DOLLAR_UNIT_PATTERN
a regular expression capturing group matching any one US Dollar unit (in particular, '$', 'USD', and 'Dollar')


DOLLAR_UNIT_MAX_TOKENS

public static final int DOLLAR_UNIT_MAX_TOKENS
See Also:
Constant Field Values

EURO_UNIT_PATTERN

public static final java.util.regex.Pattern EURO_UNIT_PATTERN
a regular expression capturing group matching any one Euro unit (in particular, '€', 'EURO', and 'Euro')


EURO_UNIT_MAX_TOKENS

public static final int EURO_UNIT_MAX_TOKENS
See Also:
Constant Field Values

MONEY_UNIT_PATTERN

public static final java.util.regex.Pattern MONEY_UNIT_PATTERN
a regular expression capturing group matching any one monetary unit (in particular, DOLLAR_UNIT, EURO_UNIT, 'YEN', and 'Yen')


MONEY_UNIT_MAX_TOKENS

public static final int MONEY_UNIT_MAX_TOKENS
See Also:
Constant Field Values

MONEY_PATTERN

public static final java.util.regex.Pattern MONEY_PATTERN
a regular expression capturing group matching any one monetary amount (in particular, a number followed by a monetary unit)


MONEY_MAX_TOKENS

public static final int MONEY_MAX_TOKENS
See Also:
Constant Field Values

TIME_UNIT_PATTERN

public static final java.util.regex.Pattern TIME_UNIT_PATTERN
a regular expression capturing group matching any one time unit (in particular, 'hour', 'minute', '(milli|micro|nano)second', 'day', 'week', 'month', 'year', 'decade', and 'century')


TIME_UNIT_MAX_TOKENS

public static final int TIME_UNIT_MAX_TOKENS
See Also:
Constant Field Values

TIME_PATTERN

public static final java.util.regex.Pattern TIME_PATTERN
a regular expression capturing group matching any one time (like '5:30 pm' or 'dawn')


TIME_MAX_TOKENS

public static final int TIME_MAX_TOKENS
See Also:
Constant Field Values

DURATION_UNIT_PATTERN

public static final java.util.regex.Pattern DURATION_UNIT_PATTERN
a regular expression capturing group matching any one duration unit (in particular, a number followed by a time unit)


DURATION_UNIT_MAX_TOKENS

public static final int DURATION_UNIT_MAX_TOKENS
See Also:
Constant Field Values

DURATION_PATTERN

public static final java.util.regex.Pattern DURATION_PATTERN
a regular expression capturing group matching any one duration (in particular, a number followed by a time unit)


DURATION_MAX_TOKENS

public static final int DURATION_MAX_TOKENS
See Also:
Constant Field Values

DAYS_UNIT_PATTERN

public static final java.util.regex.Pattern DAYS_UNIT_PATTERN
a regular expression capturing group matching any one duration given in days (in particular, a number followed by a time unit)


DAYS_UNIT_MAX_TOKENS

public static final int DAYS_UNIT_MAX_TOKENS
See Also:
Constant Field Values

DAYS_PATTERN

public static final java.util.regex.Pattern DAYS_PATTERN
a regular expression capturing group matching any one duration given in days (in particular, a number followed by a time unit)


DAYS_MAX_TOKENS

public static final int DAYS_MAX_TOKENS
See Also:
Constant Field Values

YEARS_UNIT_PATTERN

public static final java.util.regex.Pattern YEARS_UNIT_PATTERN
a regular expression capturing group matching any one duration (in particular, a number followed by a time unit)


YEARS_UNIT_MAX_TOKENS

public static final int YEARS_UNIT_MAX_TOKENS
See Also:
Constant Field Values

YEARS_PATTERN

public static final java.util.regex.Pattern YEARS_PATTERN
a regular expression capturing group matching any one duration (in particular, a number followed by a time unit)


YEARS_MAX_TOKENS

public static final int YEARS_MAX_TOKENS
See Also:
Constant Field Values

FREQUENCY_PATTERN

public static final java.util.regex.Pattern FREQUENCY_PATTERN
a regular expression capturing group matching any one frequency (like 'once', '88 times', 'twice an hour', '25 times per day', or 'every 37 minutes')


FREQUENCY_MAX_TOKENS

public static final int FREQUENCY_MAX_TOKENS
See Also:
Constant Field Values

PERCENTAGE_PATTERN

public static final java.util.regex.Pattern PERCENTAGE_PATTERN
a regular expression capturing group matching any one percentage (like '20 out of hundered', '34 percent', or '100 %')


PERCENTAGE_MAX_TOKENS

public static final int PERCENTAGE_MAX_TOKENS
See Also:
Constant Field Values

LARGE_AREA_UNIT_PATTERN

public static final java.util.regex.Pattern LARGE_AREA_UNIT_PATTERN
a regular expression capturing group matching any one larger scale area unit (in particular, 'square (kilo)meter', 'square mile', and 'acre')


LARGE_AREA_UNIT_MAX_TOKENS

public static final int LARGE_AREA_UNIT_MAX_TOKENS
See Also:
Constant Field Values

SMALL_AREA_UNIT_PATTERN

public static final java.util.regex.Pattern SMALL_AREA_UNIT_PATTERN
a regular expression capturing group matching any one smaller scale area unit (in particular, 'square (deci|cent|milli|micro|nano)meter', and 'square yard')


SMALL_AREA_UNIT_MAX_TOKENS

public static final int SMALL_AREA_UNIT_MAX_TOKENS
See Also:
Constant Field Values

AREA_UNIT_PATTERN

public static final java.util.regex.Pattern AREA_UNIT_PATTERN
a regular expression capturing group matching any one smaller scale area unit (in particular, 'square (deci|cent|milli|micro|nano)meter', and 'square yard')


AREA_UNIT_MAX_TOKENS

public static final int AREA_UNIT_MAX_TOKENS
See Also:
Constant Field Values

AREA_PATTERN

public static final java.util.regex.Pattern AREA_PATTERN
a regular expression capturing group matching any one area measure (in particular, a number followed by an area unit)


AREA_MAX_TOKENS

public static final int AREA_MAX_TOKENS
See Also:
Constant Field Values

VOLUME_UNIT_PATTERN

public static final java.util.regex.Pattern VOLUME_UNIT_PATTERN
a regular expression capturing group matching any one volume unit


VOLUME_UNIT_MAX_TOKENS

public static final int VOLUME_UNIT_MAX_TOKENS
See Also:
Constant Field Values

VOLUME_PATTERN

public static final java.util.regex.Pattern VOLUME_PATTERN
a regular expression capturing group matching any one area measure (in particular, a number followed by a volume unit)


VOLUME_MAX_TOKENS

public static final int VOLUME_MAX_TOKENS
See Also:
Constant Field Values

SIZE_UNIT_PATTERN

public static final java.util.regex.Pattern SIZE_UNIT_PATTERN
a regular expression capturing group matching any one size measure (in particular, a number followed by a length, area or volume unit)


SIZE_UNIT_MAX_TOKENS

public static final int SIZE_UNIT_MAX_TOKENS
See Also:
Constant Field Values

SIZE_PATTERN

public static final java.util.regex.Pattern SIZE_PATTERN
a regular expression capturing group matching any one size measure (in particular, a number followed by a length, area or volume unit)


SIZE_MAX_TOKENS

public static final int SIZE_MAX_TOKENS
See Also:
Constant Field Values

WEIGHT_UNIT_PATTERN

public static final java.util.regex.Pattern WEIGHT_UNIT_PATTERN
a regular expression capturing group matching any one volume unit


WEIGHT_UNIT_MAX_TOKENS

public static final int WEIGHT_UNIT_MAX_TOKENS
See Also:
Constant Field Values

WEIGHT_PATTERN

public static final java.util.regex.Pattern WEIGHT_PATTERN
a regular expression capturing group matching any one area measure (in particular, a number followed by a volume unit)


WEIGHT_MAX_TOKENS

public static final int WEIGHT_MAX_TOKENS
See Also:
Constant Field Values

SPEED_UNIT_PATTERN

public static final java.util.regex.Pattern SPEED_UNIT_PATTERN
a regular expression capturing group matching any one speed unit (in particular, a length or distance unit devided by a time unit)


SPEED_UNIT_MAX_TOKENS

public static final int SPEED_UNIT_MAX_TOKENS
See Also:
Constant Field Values

SPEED_PATTERN

public static final java.util.regex.Pattern SPEED_PATTERN
a regular expression capturing group matching any one speed (in particular, a number followed by a speed unit)


SPEED_MAX_TOKENS

public static final int SPEED_MAX_TOKENS
See Also:
Constant Field Values

TEMPERATURE_UNIT_PATTERN

public static final java.util.regex.Pattern TEMPERATURE_UNIT_PATTERN
a regular expression capturing group matching any one temperature unit (in particular, a length or distance unit devided by a time unit)


TEMPERATURE_UNIT_MAX_TOKENS

public static final int TEMPERATURE_UNIT_MAX_TOKENS
See Also:
Constant Field Values

TEMPERATURE_PATTERN

public static final java.util.regex.Pattern TEMPERATURE_PATTERN
a regular expression capturing group matching any one temperature (in particular, a number followed by a temperature unit)


TEMPERATURE_MAX_TOKENS

public static final int TEMPERATURE_MAX_TOKENS
See Also:
Constant Field Values

ANGLE_UNIT_PATTERN

public static final java.util.regex.Pattern ANGLE_UNIT_PATTERN
a regular expression capturing group matching any one angle unit (in particular, 'degree' or °)


ANGLE_UNIT_MAX_TOKENS

public static final int ANGLE_UNIT_MAX_TOKENS
See Also:
Constant Field Values

ANGLE_PATTERN

public static final java.util.regex.Pattern ANGLE_PATTERN
a regular expression capturing group matching any one angle (in particular, a number followed by 'degree' or °)


ANGLE_MAX_TOKENS

public static final int ANGLE_MAX_TOKENS
See Also:
Constant Field Values

MONTH_NAME_PATTERN

public static final java.util.regex.Pattern MONTH_NAME_PATTERN
a regular expression capturing group matching any one month name


MONTH_NAME_MAX_TOKENS

public static final int MONTH_NAME_MAX_TOKENS
See Also:
Constant Field Values

WEEKDAY_PATTERN

public static final java.util.regex.Pattern WEEKDAY_PATTERN
a regular expression capturing group matching any one weekday


WEEKDAY_MAX_TOKENS

public static final int WEEKDAY_MAX_TOKENS
See Also:
Constant Field Values

DAY_PATTERN

public static final java.util.regex.Pattern DAY_PATTERN
a regular expression capturing group matching any one day date (in particular, numbers 1 through 31)


DAY_MAX_TOKENS

public static final int DAY_MAX_TOKENS
See Also:
Constant Field Values

MONTH_PATTERN

public static final java.util.regex.Pattern MONTH_PATTERN
a regular expression capturing group matching any one month date (in particular, numbers 1 through 12)


MONTH_MAX_TOKENS

public static final int MONTH_MAX_TOKENS
See Also:
Constant Field Values

YEAR_PATTERN

public static final java.util.regex.Pattern YEAR_PATTERN
a regular expression capturing group matching any one year date (in particular, numbers 1 through 2999 and their two digit counterparts, the latter optionally preceded by a single quote)


YEAR_MAX_TOKENS

public static final int YEAR_MAX_TOKENS
See Also:
Constant Field Values

DATE_SEPARATOR_PATTERN

public static final java.util.regex.Pattern DATE_SEPARATOR_PATTERN
a regular expression capturing group matching any one character useually used to seperate the digit groups in a date (in particular, the characters ',', '.', '-', '/', and '|')


DATE_SEPARATOR_MAX_TOKENS

public static final int DATE_SEPARATOR_MAX_TOKENS
See Also:
Constant Field Values

DATE_FULL_PATTERN

public static final java.util.regex.Pattern DATE_FULL_PATTERN
a regular expression capturing group matching any one full date, optionally including the weekday (for instance, 'Thuesday, May 9th, 2006')


DATE_FULL_MAX_TOKENS

public static final int DATE_FULL_MAX_TOKENS
See Also:
Constant Field Values

DATE_DIGITAL_PATTERN

public static final java.util.regex.Pattern DATE_DIGITAL_PATTERN
a regular expression capturing group matching any one date given in digits (for instance, '05-09-06')


DATE_DIGITAL_MAX_TOKENS

public static final int DATE_DIGITAL_MAX_TOKENS
See Also:
Constant Field Values

DATE_PATTERN

public static final java.util.regex.Pattern DATE_PATTERN
a regular expression capturing group matching any one date


DATE_MAX_TOKENS

public static final int DATE_MAX_TOKENS
See Also:
Constant Field Values

DAY_MONTH_PATTERN

public static final java.util.regex.Pattern DAY_MONTH_PATTERN
a regular expression capturing group matching any one day in a month


DAY_MONTH_MAX_TOKENS

public static final int DAY_MONTH_MAX_TOKENS
See Also:
Constant Field Values

DECADE_PATTERN

public static final java.util.regex.Pattern DECADE_PATTERN
a regular expression capturing group matching any one decade (like '80s', '1920s' or 'sixties')


DECADE_MAX_TOKENS

public static final int DECADE_MAX_TOKENS
See Also:
Constant Field Values

CENTURY_PATTERN

public static final java.util.regex.Pattern CENTURY_PATTERN
a regular expression capturing group matching any one century (like 'eighteenth century', or '21st century')


CENTURY_MAX_TOKENS

public static final int CENTURY_MAX_TOKENS
See Also:
Constant Field Values

ALL_UPPER_CASE_ACRONYM_PATTERN

public static final java.util.regex.Pattern ALL_UPPER_CASE_ACRONYM_PATTERN
a regular expression capturing group matching any sequence of two or more upper case letters (probably acronyms)


ALL_UPPER_CASE_ACRONYM_MAX_TOKENS

public static final int ALL_UPPER_CASE_ACRONYM_MAX_TOKENS
See Also:
Constant Field Values

PUNCTUATED_ALL_UPPER_CASE_ACRONYM_PATTERN

public static final java.util.regex.Pattern PUNCTUATED_ALL_UPPER_CASE_ACRONYM_PATTERN
a regular expression capturing group matching any sequence of two or more upper case letters, intermixed with punctuation marks like dots and ampersands (probably acronyms)


PUNCTUATED_ALL_UPPER_CASE_ACRONYM_MAX_TOKENS

public static final int PUNCTUATED_ALL_UPPER_CASE_ACRONYM_MAX_TOKENS
See Also:
Constant Field Values

MIXED_CASE_ACRONYM_PATTERN

public static final java.util.regex.Pattern MIXED_CASE_ACRONYM_PATTERN
a regular expression capturing group matching any sequence of two or more upper case letters with intermediate or tailing lower case ones (probably acronyms)


MIXED_CASE_ACRONYM_MAX_TOKENS

public static final int MIXED_CASE_ACRONYM_MAX_TOKENS
See Also:
Constant Field Values

ACRONYM_PATTERN

public static final java.util.regex.Pattern ACRONYM_PATTERN
a regular expression capturing group matching any probable acronym (disjunctive combination of MIXED_CASE_ACRONYM and PUNCTUATED_ALL_UPPER_CASE_ACRONYM)


ACRONYM_MAX_TOKENS

public static final int ACRONYM_MAX_TOKENS
See Also:
Constant Field Values

SINGLE_SCORE_PATTERN

public static final java.util.regex.Pattern SINGLE_SCORE_PATTERN
a regular expression capturing group matching any probable single score from sports events (like soccer or basketball scores, '5:3' or '89:109')


SINGLE_SCORE_MAX_TOKENS

public static final int SINGLE_SCORE_MAX_TOKENS
See Also:
Constant Field Values

MULTI_SCORE_PATTERN

public static final java.util.regex.Pattern MULTI_SCORE_PATTERN
a regular expression capturing group matching any probable multi score from sports events (like tennis, '6:3, 6:1, 6:7, 7:5')


MULTI_SCORE_MAX_TOKENS

public static final int MULTI_SCORE_MAX_TOKENS
See Also:
Constant Field Values

SCORE_PATTERN

public static final java.util.regex.Pattern SCORE_PATTERN
a regular expression capturing group matching any probable single or multi score from sports events (like soccer or tennis, '5:3' or '6:3, 6:1, 6:7, 7:5')


SCORE_MAX_TOKENS

public static final int SCORE_MAX_TOKENS
See Also:
Constant Field Values

URL_PATTERN

public static final java.util.regex.Pattern URL_PATTERN
a regular expression capturing group matching any URL (like 'http://www.uni-karlsruhe.de:8080/res/index.html')


URL_MAX_TOKENS

public static final int URL_MAX_TOKENS
See Also:
Constant Field Values

LEGAL_SENTENCE_PATTERN

public static final java.util.regex.Pattern LEGAL_SENTENCE_PATTERN
a regular expression capturing group matching any legal sentence ('not guilty' and 'guilty')


LEGAL_SENTENCE_MAX_TOKENS

public static final int LEGAL_SENTENCE_MAX_TOKENS
See Also:
Constant Field Values

PROPER_NAME_PATTERN

public static final java.util.regex.Pattern PROPER_NAME_PATTERN
a regular expression capturing group matching any proper name (sequence of capiatlized words, with only 'of the', 'of', 'the', 'and' allowed in lower case)


PROPER_NAME_MAX_TOKENS

public static final int PROPER_NAME_MAX_TOKENS
See Also:
Constant Field Values

STREET_PATTERN

public static final java.util.regex.Pattern STREET_PATTERN
a regular expression capturing group matching any street (in particular, a proper name followed by a synonym of 'street', like 'Madison Avenue', 'Capitol Beltway', 'Burbon Street', etc)


STREET_MAX_TOKENS

public static final int STREET_MAX_TOKENS
See Also:
Constant Field Values

COUNTY_PATTERN

public static final java.util.regex.Pattern COUNTY_PATTERN
a regular expression capturing group matching any county name (in particular, a proper name followed by 'County')


COUNTY_MAX_TOKENS

public static final int COUNTY_MAX_TOKENS
See Also:
Constant Field Values

REEF_PATTERN

public static final java.util.regex.Pattern REEF_PATTERN
a regular expression capturing group matching any reef name (in particular, a proper name followed by 'Reef')


REEF_MAX_TOKENS

public static final int REEF_MAX_TOKENS
See Also:
Constant Field Values

EDUCATIONAL_INSTITUTION_PATTERN

public static final java.util.regex.Pattern EDUCATIONAL_INSTITUTION_PATTERN
a regular expression capturing group matching any educational institution (in particular, a proper name followed by one of 'University', 'College', 'High (School)', or 'Elementary (School)', or 'University of' followed by a proper name)


EDUCATIONAL_INSTITUTION_MAX_TOKENS

public static final int EDUCATIONAL_INSTITUTION_MAX_TOKENS
See Also:
Constant Field Values

FEET_UNIT

public static final java.lang.String FEET_UNIT
See Also:
Constant Field Values

FEET_UNIT_MAX_TOKENS

public static final int FEET_UNIT_MAX_TOKENS
See Also:
Constant Field Values

FEET_UNIT_PATTERN

public static final java.util.regex.Pattern FEET_UNIT_PATTERN

FEET

public static final java.lang.String FEET
See Also:
Constant Field Values

FEET_MAX_TOKENS

public static final int FEET_MAX_TOKENS
See Also:
Constant Field Values

FEET_PATTERN

public static final java.util.regex.Pattern FEET_PATTERN

GALLONS_UNIT

public static final java.lang.String GALLONS_UNIT
See Also:
Constant Field Values

GALLONS_UNIT_MAX_TOKENS

public static final int GALLONS_UNIT_MAX_TOKENS
See Also:
Constant Field Values

GALLONS_UNIT_PATTERN

public static final java.util.regex.Pattern GALLONS_UNIT_PATTERN

GALLONS

public static final java.lang.String GALLONS
See Also:
Constant Field Values

GALLONS_MAX_TOKENS

public static final int GALLONS_MAX_TOKENS
See Also:
Constant Field Values

GALLONS_PATTERN

public static final java.util.regex.Pattern GALLONS_PATTERN

GRAMS_UNIT

public static final java.lang.String GRAMS_UNIT
See Also:
Constant Field Values

GRAMS_UNIT_MAX_TOKENS

public static final int GRAMS_UNIT_MAX_TOKENS
See Also:
Constant Field Values

GRAMS_UNIT_PATTERN

public static final java.util.regex.Pattern GRAMS_UNIT_PATTERN

GRAMS

public static final java.lang.String GRAMS
See Also:
Constant Field Values

GRAMS_MAX_TOKENS

public static final int GRAMS_MAX_TOKENS
See Also:
Constant Field Values

GRAMS_PATTERN

public static final java.util.regex.Pattern GRAMS_PATTERN

LITERS_UNIT

public static final java.lang.String LITERS_UNIT
See Also:
Constant Field Values

LITERS_UNIT_MAX_TOKENS

public static final int LITERS_UNIT_MAX_TOKENS
See Also:
Constant Field Values

LITERS_UNIT_PATTERN

public static final java.util.regex.Pattern LITERS_UNIT_PATTERN

LITERS

public static final java.lang.String LITERS
See Also:
Constant Field Values

LITERS_MAX_TOKENS

public static final int LITERS_MAX_TOKENS
See Also:
Constant Field Values

LITERS_PATTERN

public static final java.util.regex.Pattern LITERS_PATTERN

MILES_UNIT

public static final java.lang.String MILES_UNIT
See Also:
Constant Field Values

MILES_UNIT_MAX_TOKENS

public static final int MILES_UNIT_MAX_TOKENS
See Also:
Constant Field Values

MILES_UNIT_PATTERN

public static final java.util.regex.Pattern MILES_UNIT_PATTERN

MILES

public static final java.lang.String MILES
See Also:
Constant Field Values

MILES_MAX_TOKENS

public static final int MILES_MAX_TOKENS
See Also:
Constant Field Values

MILES_PATTERN

public static final java.util.regex.Pattern MILES_PATTERN

MPH_UNIT

public static final java.lang.String MPH_UNIT
See Also:
Constant Field Values

MPH_UNIT_MAX_TOKENS

public static final int MPH_UNIT_MAX_TOKENS
See Also:
Constant Field Values

MPH_UNIT_PATTERN

public static final java.util.regex.Pattern MPH_UNIT_PATTERN

MPH

public static final java.lang.String MPH
See Also:
Constant Field Values

MPH_MAX_TOKENS

public static final int MPH_MAX_TOKENS
See Also:
Constant Field Values

MPH_PATTERN

public static final java.util.regex.Pattern MPH_PATTERN

OUNCES_UNIT

public static final java.lang.String OUNCES_UNIT
See Also:
Constant Field Values

OUNCES_UNIT_MAX_TOKENS

public static final int OUNCES_UNIT_MAX_TOKENS
See Also:
Constant Field Values

OUNCES_UNIT_PATTERN

public static final java.util.regex.Pattern OUNCES_UNIT_PATTERN

OUNCES

public static final java.lang.String OUNCES
See Also:
Constant Field Values

OUNCES_MAX_TOKENS

public static final int OUNCES_MAX_TOKENS
See Also:
Constant Field Values

OUNCES_PATTERN

public static final java.util.regex.Pattern OUNCES_PATTERN

POUNDS_UNIT

public static final java.lang.String POUNDS_UNIT
See Also:
Constant Field Values

POUNDS_UNIT_MAX_TOKENS

public static final int POUNDS_UNIT_MAX_TOKENS
See Also:
Constant Field Values

POUNDS_UNIT_PATTERN

public static final java.util.regex.Pattern POUNDS_UNIT_PATTERN

POUNDS

public static final java.lang.String POUNDS
See Also:
Constant Field Values

POUNDS_MAX_TOKENS

public static final int POUNDS_MAX_TOKENS
See Also:
Constant Field Values

POUNDS_PATTERN

public static final java.util.regex.Pattern POUNDS_PATTERN

RANGE_UNIT

public static final java.lang.String RANGE_UNIT
See Also:
Constant Field Values

RANGE_UNIT_MAX_TOKENS

public static final int RANGE_UNIT_MAX_TOKENS
See Also:
Constant Field Values

RANGE_UNIT_PATTERN

public static final java.util.regex.Pattern RANGE_UNIT_PATTERN

RANGE

public static final java.lang.String RANGE
See Also:
Constant Field Values

RANGE_MAX_TOKENS

public static final int RANGE_MAX_TOKENS
See Also:
Constant Field Values

RANGE_PATTERN

public static final java.util.regex.Pattern RANGE_PATTERN

RATE

public static final java.lang.String RATE
See Also:
Constant Field Values

RATE_MAX_TOKENS

public static final int RATE_MAX_TOKENS
See Also:
Constant Field Values

RATE_PATTERN

public static final java.util.regex.Pattern RATE_PATTERN

SQUARE_MILES_UNIT

public static final java.lang.String SQUARE_MILES_UNIT
See Also:
Constant Field Values

SQUARE_MILES_UNIT_MAX_TOKENS

public static final int SQUARE_MILES_UNIT_MAX_TOKENS
See Also:
Constant Field Values

SQUARE_MILES_UNIT_PATTERN

public static final java.util.regex.Pattern SQUARE_MILES_UNIT_PATTERN

SQUARE_MILES

public static final java.lang.String SQUARE_MILES
See Also:
Constant Field Values

SQUARE_MILES_MAX_TOKENS

public static final int SQUARE_MILES_MAX_TOKENS
See Also:
Constant Field Values

SQUARE_MILES_PATTERN

public static final java.util.regex.Pattern SQUARE_MILES_PATTERN

TONS_UNIT

public static final java.lang.String TONS_UNIT
See Also:
Constant Field Values

TONS_UNIT_MAX_TOKENS

public static final int TONS_UNIT_MAX_TOKENS
See Also:
Constant Field Values

TONS_UNIT_PATTERN

public static final java.util.regex.Pattern TONS_UNIT_PATTERN

TONS

public static final java.lang.String TONS
See Also:
Constant Field Values

TONS_MAX_TOKENS

public static final int TONS_MAX_TOKENS
See Also:
Constant Field Values

TONS_PATTERN

public static final java.util.regex.Pattern TONS_PATTERN

ZIPCODE

public static final java.lang.String ZIPCODE
See Also:
Constant Field Values

ZIPCODE_MAX_TOKENS

public static final int ZIPCODE_MAX_TOKENS
See Also:
Constant Field Values

ZIPCODE_PATTERN

public static final java.util.regex.Pattern ZIPCODE_PATTERN

PHONE_NUMBER

public static final java.lang.String PHONE_NUMBER
See Also:
Constant Field Values

PHONE_NUMBER_MAX_TOKENS

public static final int PHONE_NUMBER_MAX_TOKENS
See Also:
Constant Field Values

PHONE_NUMBER_PATTERN

public static final java.util.regex.Pattern PHONE_NUMBER_PATTERN
Constructor Detail

RegExMatcher

public RegExMatcher()
Method Detail

markAllMatches

public static java.lang.String[] markAllMatches(java.lang.String[] tokens,
                                                java.lang.String regEx)
mark all parts of a token sequence that match a regular expression

Parameters:
tokens - the token sequence to be rooted through
regEx - the regular expression that's matches are to be extracted
Returns:
an array of marker Strings marking all subsequences of the specified token sequence that match the specified regular expression

markAllMatches

public static java.lang.String[] markAllMatches(java.lang.String[] tokens,
                                                java.util.regex.Pattern pattern)
mark all parts of a token sequence that match a regular expression

Parameters:
tokens - the token sequence to be rooted through
pattern - the regular expression that's matches are to be extracted
Returns:
an array of marker Strings marking all subsequences of the specified token sequence that match the specified regular expression

markAllMatches

public static java.lang.String[] markAllMatches(java.lang.String[] tokens,
                                                java.lang.String regEx,
                                                int maxTokens)
mark all parts of a token sequence that match a regular expression

Parameters:
tokens - the token sequence to be rooted through
regEx - the regular expression that's matches are to be extracted
maxTokens - the maximum number of tokens a matching part may contain (0 means no limit, Attention: high computation effort)
Returns:
an array of marker Strings marking all subsequences of the specified token sequence that match the specified regular expression

markAllMatches

public static java.lang.String[] markAllMatches(java.lang.String[] tokens,
                                                java.util.regex.Pattern pattern,
                                                int maxTokens)
mark all parts of a token sequence that match a regular expression

Parameters:
tokens - the token sequence to be rooted through
pattern - the pattern that's matches are to be extracted
maxTokens - the maximum number of tokens a matching part may contain (0 means no limit, Attention: high computation effort)
Returns:
an array of marker Strings marking all subsequences of the specified token sequence that match the specified regular expression

extractAllMatches

public static java.lang.String[] extractAllMatches(java.lang.String text,
                                                   java.lang.String regEx)
extract all parts from a token sequence that match a regular expression

Parameters:
text - the token sequence to be rooted through
regEx - the regular expression that's matches are to be extracted
Returns:
an array containing all subsequences of the specified token sequence that match the specified regular expression

extractAllMatches

public static java.lang.String[] extractAllMatches(java.lang.String text,
                                                   java.util.regex.Pattern pattern)
extract all parts from a token sequence that match a regular expression

Parameters:
text - the token sequence to be rooted through
pattern - the regular expression Pattern that's matches are to be extracted
Returns:
an array containing all subsequences of the specified token sequence that match the specified regular expression

getDictionary

public static HashDictionary getDictionary(java.lang.String name)
load a gazetteer

Parameters:
name - the name of the list to be loaded
Returns:
the gazetteer with the specified name, packe in a HashSet for faste lookup

markAllContained

public static java.lang.String[] markAllContained(java.lang.String[] tokens,
                                                  HashDictionary dictionary)
mark all parts of a String that are contained in a list of Strings

Parameters:
tokens - the token sequence to be rooted through
dictionary - the gazetteer containing the Strings to be found
Returns:
an array of marker Strings marking all subsequences of the specified token sequence that's String representation is contained in the specified list

markAllContained

public static java.lang.String[] markAllContained(java.lang.String[] tokens,
                                                  HashDictionary dictionary,
                                                  int threshold)
mark all parts of a String that are fuzzy-contained in a list of Strings

Parameters:
tokens - the token sequence to be rooted through
dictionary - the gazetteer containing the Strings to be found
threshold - the maximum editing distance for which a fuzzy lookup shall return true
Returns:
an array of marker Strings marking all subsequences of the specified token sequence that's String representation is contained in the specified list

extractAllContained

public static java.lang.String[] extractAllContained(java.lang.String[] tokens,
                                                     HashDictionary dictionary)
mark all parts of a String that are contained in a list of Strings

Parameters:
tokens - the token sequence to be rooted through
dictionary - the gazetteer containing the Strings to be found
Returns:
an array of marker Strings marking all subsequences of the specified token sequence that's String representation is contained in the specified list

extractAllContained

public static java.lang.String[] extractAllContained(java.lang.String[] tokens,
                                                     HashDictionary dictionary,
                                                     int threshold)
mark all parts of a String that are fuyy-contained in a list of Strings

Parameters:
tokens - the token sequence to be rooted through
dictionary - the gazetteer containing the Strings to be found
threshold - the maximum editing distance for which a fuzzy lookup shall return true
Returns:
an array of marker Strings marking all subsequences of the specified token sequence that's String representation is contained in the specified list

extractNumbers

public static java.lang.String[] extractNumbers(java.lang.String[] tokens)
mark all numbers in a token sequence

Parameters:
tokens - the token sequence
Returns:
an array of marker Strings marking all numbers in the specified token sequence

extractQuantities

public static java.lang.String[] extractQuantities(java.lang.String[] tokens,
                                                   java.lang.String[] numberMarkers,
                                                   java.util.regex.Pattern dimensionPattern,
                                                   int maxTokens)
mark all parts from a token sequence that match a regular expression

Parameters:
tokens - the token sequence to be rooted through
dimensionPattern - the pattern that's matches are to be extracted
maxTokens - the maximum number of tokens a matching part may contain (0 means no limit, Attention: high computation effort)
Returns:
an array of marker Strings marking all subsequences of the specified token sequence that match the specified regular expression

extractOrdinalNumbers

public static java.lang.String[] extractOrdinalNumbers(java.lang.String[] tokens)
mark all numbers in a token sequence

Parameters:
tokens - the token sequence
Returns:
an array of marker Strings marking all numbers in the specified token sequence

compile

public static java.util.regex.Pattern compile(java.lang.String regEx)
create a Pattern from a regular expression String, using default case sensitivity

Parameters:
regEx - the regular expression String
Returns:
a Pattern compield from the specified regular expression String

compile

public static java.util.regex.Pattern compile(java.lang.String regEx,
                                              boolean caseSensitive)
create a Pattern from a regular expression String

Parameters:
regEx - the regular expression String
caseSensitive - create a case sensitive Pattern or not?
Returns:
a Pattern compield from the specified regular expression String