3/31/2006 AL Group Meeting

 

* Agenda

 - API

 - Evaluation

 

 

1. Evaluation

 a. Metrics

   - Overall bootstrapping extraction accuracy (also dependent on how the

     overall system uses our probabilities)

   - Compare rule precision values returned by different probability/score

     schemes. (In some domains, can compare to "ground truth".)

   - Compare/evaluate example precision values returned by each

     probability/score scheme versus "ground truth" that is us labeling.

     (Probably faster with some threshold: Score -> {0.1})

   - Given a rule-picking mechanism for the bootstrapper, compare extraction

     volume across different prob/score schemes.

 b. Probability Estimation

   - No prob estmation

   - Co-EM-ish thing (Jon & Jaime, Rosie Jones)

   - PMI

   - Noisy-or Model

   - Pollution Network

   - URNs

 c. Active Learning Algorithms

     Ideally, we would like to minimize, where Pi is the real precision of rule

      i and P^i is the estimated precision of the rule. Since we don’t know what Pi should

     be, here are some other measures that may be reasonable:

 

   - Random (rather passive learning)

   - Constant rule confidence value (e.g. 90%, all rules are good)

   - e.g.  (overall precision given the example label)]

 d. Relations

   - IsCity()

   - IsNation()

 

 

2. API

Register Relations (r1, r2…) ->AL_object

AL_object  ->AddExtractor(e, p_r1, p_r2..)

AL_object  -> AddOccurrence(o, e)

AL_object  -> GetExtractorProbability(e) ->(p1, p2,…)

AL_object  -> GetEntityProbability(ent) -> (p1, p2..)

 

 

3. Terminology/Nomenclature

1. Relations/Predicate ~strings

2. Rules/Extraction Rule/Patterns/Extractors/Contexts ~string, left/right handside

3. Claim/Belief/Assertion

4. Occurrence/Extraction/Instance/Span ~string, spans

5. Entities/Concepts/Entity Pairs ~string