3/31/2006 AL Group Meeting

 

* Agenda

- API

- Evaluation

 

 

1. Evaluation

a. Metrics

†† - Overall bootstrapping extraction accuracy (also dependent on how the

†††† overall system uses our probabilities)

†† - Compare rule precision values returned by different probability/score

†††† schemes. (In some domains, can compare to "ground truth".)

†† - Compare/evaluate example precision values returned by each

†††† probability/score scheme versus "ground truth" that is us labeling.

†††† (Probably faster with some threshold: Score -> {0.1})

†† - Given a rule-picking mechanism for the bootstrapper, compare extraction

†††† volume across different prob/score schemes.

b. Probability Estimation

†† - No prob estmation

†† - Co-EM-ish thing (Jon & Jaime, Rosie Jones)

†† - PMI

†† - Noisy-or Model

†† - Pollution Network

†† - URNs

c. Active Learning Algorithms

††† Ideally, we would like to minimize, where Pi is the real precision of rule

††††† i and P^i is the estimated precision of the rule. Since we donít know what Pi should

†††† be, here are some other measures that may be reasonable:

 

†† - Random (rather passive learning)

†† - Constant rule confidence value (e.g. 90%, all rules are good)

†† - e.g. (overall precision given the example label)]

d. Relations

†† - IsCity()

†† - IsNation()

 

 

2. API

Register Relations (r1, r2Ö) ->AL_object

AL_object->AddExtractor(e, p_r1, p_r2..)

AL_object-> AddOccurrence(o, e)

AL_object-> GetExtractorProbability(e) ->(p1, p2,Ö)

AL_object -> GetEntityProbability(ent) -> (p1, p2..)

 

 

3. Terminology/Nomenclature

1. Relations/Predicate ~strings

2. Rules/Extraction Rule/Patterns/Extractors/Contexts ~string, left/right handside

3. Claim/Belief/Assertion

4. Occurrence/Extraction/Instance/Span ~string, spans

5. Entities/Concepts/Entity Pairs ~string