Parses documents in NIST's TREC format. Does case folding for words that are not in the acronym list. Contraction suffixes and possessive suffixes are stripped. U.S.A., USA's, and USAs are converted to USA. Does not recognize acronyms with numbers. The following fields are parsed: TEXT, HL, HEAD, HEADLINE, LP, TTL.
More...
Parses documents in NIST's TREC format. Does case folding for words that are not in the acronym list. Contraction suffixes and possessive suffixes are stripped. U.S.A., USA's, and USAs are converted to USA. Does not recognize acronyms with numbers. The following fields are parsed: TEXT, HL, HEAD, HEADLINE, LP, TTL.