Spring 2006


A number of software packages are available.  Please suggest additional relevant software.

WIT - A collection of Java classes for accessing web pages using either command line arguments or direct calls from Java.  Supports (a) getting a single page given its URL, (b) getting a number of pages that match a specified search query, and (c) crawling and caching entire websites.

UIMA - A package for combining outputs of multiple text annotators into an efficient processing pipeline interfaced to a database to store large annotated dataset (details coming soon - meanwhile contact Eric Nyberg).

Minorthird - A collection of Java classes for storing text, annotating text, and learning to extract entities and categorize text.
Scone - A knowledge base system which we'll use as a repository for facts and beliefs in the ReadTheWeb system. 

SecondString  -A collection of Java classes for approximate string-matching.

