Integrating Scope into Learning with Application to Information Extraction

NIPS 2001 by Drew Bagnell (CMU RI), David Blei (Berkeley CS), Andrew McCallum (WhizBang Labs)

Presented by Drew Bagnell

Abstract

A fascinating aspect of classification and extraction from the Web is the richness of the information in its formatting, layout, directory structures, and linkage information. Within a particular web site these structural regularities are often powerfully indicative features for common classification and extraction tasks. For example, if one wished to extract the titles of all the books on Amazon.com, one can rely on the fact that the book title appears in the same location on the book's home page and in the same font. The difficulty with using this information is that each web site has its own, different structural regularities, and thus one cannot successfully apply models tuned for one site to extraction from another. In response, many researchers have built tools to facilitate hand-tuning of site-specific extractors.

Most statistical models assume that the modeled data are independently, identically distributed. However, as in the above example, it is often the case that certain proper subsets of the data share identifiable regularities that do not occur throughout its entirety. Other examples of subsets that may have local regularity include patients from a particular hospital, voice sounds from a particular speaker, or vibration data from a particular airplane. In other words, certain patterns may exhibit degrees of scope.

Our thesis is that leveraging these local regularities can significantly improve the performance of a learner because local features are often both simple and highly indicative. The central difficulty is that in practical problems our trained algorithm will be applied to novel locales not encountered in our labeled dataset. The only knowledge directly applicable to the new data is from the traditionally-used, global regularities, {\it i.e.} those features that are independently, identically distributed across locales.

We will discuss a generative probabilistic model for modeling features with varying scope and appropriate algorithms to leverage those features. Finally, we will demonstrate the effectiveness of the approach by showing dramatically improved performance on an information extraction problem.


Back to the Main Page

Charles Rosenberg
Last modified: Tue Mar 12 18:00:47 EST 2002