Active Learning for
Information Extraction via Bootstrap Learning

Andrew Carlson (acarlson)
Kevin Killourhy (ksk)
Sue Ann Hong (sahong)
Sophie Wang (

Read the Web, Spring 2006
School of Computer Science
Carnegie Mellon University


One of the greatest weaknesses of information extraction through bootstrap learning is the tendency of bootstrapping to diverge over time in the "correctness" of extracted facts. Hence some form of quality control is crucial in order to sustain such a bootstrapping system. Our project addresses this issue in two tiers: first, we plan to develope metrics for scoring the correctness of extractions, and second, come up with corresponding active learning for scoring metrics in order to aid quality control.


Project Proposal [pdf]

  1. Introduction
  2. Objectives
  3. Methods
  4. Evaluation

Meeting Notes


Date RtW Goals Our Goals
  • Module code ready.
  • Integration on code stubs.
  • Finish coding our interface functions. ("proj code")
  • Divide up probability estimation derivation and coding different versions of the core fuction.
  • Integrated system based on proj code.
  • Code different versions of the core function. Test with simple bootstrapper.
  • Experiments w/ integrated code.
  • Extensions
  • Merge with J&J?
  • Final evaluation (5/4)
  • Touch up on whatever needs to be done (alg, merging with J&J).
  • Entire system write-up due.
  • Write up about our module. Help merge.