3/31/2006 AL Group Meeting

* Agenda

- API

- Evaluation

1. Evaluation

a. Metrics

- Overall bootstrapping extraction accuracy (also dependent on how the

overall system uses our probabilities)

- Compare rule precision values returned by different probability/score

schemes. (In some domains, can compare to "ground truth".)

- Compare/evaluate example precision values returned by each

probability/score scheme versus "ground truth" that is us labeling.

(Probably faster with some threshold: Score -> {0.1})

- Given a rule-picking mechanism for the bootstrapper, compare extraction

volume across different prob/score schemes.

b. Probability Estimation

- No prob estmation

- Co-EM-ish thing (Jon & Jaime, Rosie Jones)

- PMI

- Noisy-or Model

- Pollution Network

- URNs

c. Active Learning Algorithms

Ideally, we would like to minimize, where P_i is the real precision of rule

i and P^_i is the estimated precision of the rule. Since we don’t know what P_i should

be, here are some other measures that may be reasonable:

- Random (rather passive learning)

- Constant rule confidence value (e.g. 90%, all rules are good)

- e.g. (overall precision given the example label)]

d. Relations

- IsCity()

- IsNation()

2. API

Register Relations (r1, r2…) ->AL_object

AL_object ->AddExtractor(e, p_r1, p_r2..)

AL_object -> AddOccurrence(o, e)

AL_object -> GetExtractorProbability(e) ->(p1, p2,…)

AL_object -> GetEntityProbability(ent) -> (p1, p2..)

3. Terminology/Nomenclature

1. Relations/Predicate ~strings

2. Rules/Extraction Rule/Patterns/Extractors/Contexts ~string, left/right handside

3. Claim/Belief/Assertion

4. Occurrence/Extraction/Instance/Span ~string, spans

5. Entities/Concepts/Entity Pairs ~string