04/14/2006 AL Group Meeting =Plan of Attack= * Probability/Score Estimation .No probability estimation - Return same score for all rules and entities. All entities are equally correct for the given relation. (0.5 as is would be fine) - No AL. .PMI: Kevin - score for entities = GoogleHitCount(entity string, relation string) / GoogleHitCount(entity string) - score for rules = GoogleHitCount(entity string, relation string) / GoogleHitCount(entity string) OR, infer from scores for entities somehow. --> in this case, can use AL to maximize some quantity (see below). .Noisy-or: Andy - Give all rules a conatant score (e.g. 90%): this is the no-AL baseline. - Use independence assumption between rules and between extraction occurances. - Sue Ann is going to bug Andy and try to extend/relax some assumptions. But she doesn't want to touch URNs. .Co-EM-ish thing: Sophie - Understand what Rosie Jones's active learning paper does. It should give several probability estimation-independent criteria such as "most frequently extract entities". These should fit with pretty much any PE we implement. (See below for some notes on this.) - What does Rosie Jones actually do with regards to scoring? It'd be nice to implement Rosie Jones's scheme for comparison. It can be used as the PE for looking at these PE-independent AL schemes. * Possible Active Learning criteria .(not AL) Random entity selection. .Maximize the change of scores over all rules. .Maximize the sum of change of each score. .Rosie Jones -Most frequent -Context disagreement? In our case it'd be something like the entity extracted by some rules and not extracted by others. Though one can imagine something like "it" may be extracted by everything, which supposedly would be handled by "Most frequent". Is there a way to combine these two heuristics? -Feature set disagreement: What's the difference between this and "Context disagreement"? Granted in our case we don't have classifiers so-to-say. I'm having a little bit of trouble understanding Rosie Jones's problem space.