Ariadna Font Llitjós March 7, 2006 Examples of Affected Rules for each Action: Edit: The affected rules are the ones involving the lexical item that has been modified by the user. For example, for the log file in 9 and the parse trace ( ((S,1 (NP,2 (N,7:1 "GAUDÍ") ) (VP,3 (VB,2 (AUX,1:2 "ERA") ) (NP,8 (DET,3:3 "UN") (N,8:5 "ARTISTA") (ADJ,3:4 "GRANDE") ) ) ) ) ) Since the word “grande” is the one that has been modified by the user, the rule IDs that should be stored in the AffectedRules vector are: ADJ,3. This indicates that the refinement needs to take place in the lexicon. In general, simple POS, such as N(oun), V(erb), DET(erminer), ADJ(ective), PREP(osition), ADV(erb), AUX(iliary verb), etc. identify lexical entries, whereas phrasal POS, such as NP (noun phrase), VP (verb phrase), PP (prepositional phrase), S(entence) identify grammar rules. If a clue word was identified by the user (and hopefully by your CI class), then the rule that subsumes both the word edited and the clue word should also be pushed into the vector. For example, if the user identified “artista” as being the clue word, then rule NP,8, which contains both “artista” and “grande” should be stored as an AffectedRule. Change word order: If the word order is local to a rule, the word was moved next to another word/constituent contained in the same parent node, then store the parent node. In the same example above, since “artista gran” becomes “gran artista” after the user has corrected the word order of the edited word. And thus from the tree, we can extract that NP,8 is the one affected by the change. If the move was not local to a rule, namely it was a long distance move, then I’m not counting on being able to refine it, precisely because it is impossible to know which rule should be affected. For example, if “gran” was dragged between “gaudi” and “era” in the sentence above, say, then it could be that NP,2 and VP,3 would be affected, but maybe also S,1 and NP,8… since we cannot tell from the information we have at hand, we won’t take an action on such cases. I expect to be able to detect such cases by looking at their vector of RuleIDs and seeing that they are of size 0. Let me know if you decide to do anything difference so that I know when this case occurs in a CI. Add: As you mentioned, add is a tricky action, but we can still make some reasonable hypothesis wrt. what are the rules that could be affected by the CI at hand. If a word is inserted in the middle of a rule, as shown in the parse trace, namely, surrounded by constituents that have the same parent node, then that parent node should be added to the affected rule IDs vector. If a word is inserted between two rules, then the possibilities are more varied. It could be that it is affecting the rule before it, the rule after it, or it could just require a lexical refinement. Since the only way to know this is by checking alignment information (which the way CIs are implemented right now, constitutes a different action), we can just add both rules into the AffectedRules vector (and maybe we should indeed change the name of the vector to reflect the true nature of these ids --> PotentialyAffectedRules). For this case, the alignments or lack thereof, gives us more clues about what refinement is really required, so an alternative way to parse log files would be to consider alignment changes affecting a word that has just been added part of the same correction Action For the log file in 5 (tl: TÚ VISTE LA MUJER) and its parse trace ( ((S,1 (NP,1 (PRON,2:1 "TÚ") ) (VP,46 (V,4:2 "VISTE") (NP,3 (DET,2:3 "LA") (N,4:4 "MUJER") )) ) ) ) The word “a” is added between “viste” and “la”, and thus between rules VP,46 and NP,3. The rule that we want to refine in this case, is NP,3, but we will not know this a priori for any given language pair, so a more general approach needs to be taken for such cases. If “a” were added between “la” and “mujer” instead, then the potentially affected rule would clearly be NP,3. If there was alignment information that indicated that “a” should be aligned to “the” in English, say, then this would be evidence that this might require a lexical refinement, where “the” is translated into both “la a” and not just “la”, and possibly no grammar rule would need to be refined. Delete: Similarly to the add action, when the user deletes a word, if that word is within a rule, then that rule might be affected by the refinement, or it might just be a lexical refinement, not involving the grammar. So for now, just adding the parent node of the two immediate words/constituents should do. Or if the word is deleted at the beginning or end of the sentence, then the first or last rule would be potential candidates. Unfortunately the example I had in log file 7 ((((S,0 (VP,46 (V,10:2 "WOULD") (NP,2 (N,11:3 "LIKE") ) ) ) )> <((PP,2 (PREP,1:4 "QUE") (VP,1 (V,8:5 "IR") ) ) )) is not good to illustrate this, since untranslated words should really be edited as opposed to dragged to trash and then added… but let’s say that “would” had been edited to “quiero” and then like is moved to trash, then VP,46 might be affected by later refinements. I think a better example is rule NP,10: [NP]-->[“a” NP] for the SLS: John read the book and the TLS A Juan leyo el libro --> CTLS: Juan leyo el libro. The parse tree would indicate something like this: (S,1 (NP,10 (“A”) (N,x, JUAN)) (VP,46 (V,j "LEYO") (NP,3 (DET,1 "EL") (N,y "LIBRO") )) ), not taking into account alignment info, and so the potentially affected rule in this case would be NP,10. Several other examples that come to mind would only require lexical entry refinements, and thus no grammar rule would need to be refined. Alignment changes For now, alignment changes just help me figure out what refinement might be more appropriate given other correction actions, such as add. So the vector for actions concerning alignments will also be empty.