************************ Current version: V0.03 RuleRefinement.cpp make RR or make RRclean ./RuleRefinement.exe ************************ updating files in temuco and afs backups: cp -uR /avenue/usr0/aria/RuleRefinement/* . ******************************************************* --- Previous notes from research-diary (collected Jan 12, 2005) ----------- Aug 10-12 2004 check black research diary for notes of what I did with Kathrin April 3, 2003 - To run a specific grammar (Kathrin's learned grammar) on a test corpus, I can use Kathrin's code(probably will need to modify it): temuco: /usr0/kathrin/RuleLearning/RunTimeSystem/RunTimeSystem.cpp (Make RunTimeSystem) Aug 12, 2003 - created a CVS repository on avenue:/usr0/aria/RuleRefinement/CVS-Repository August 13, 2003 - Created a working directory under /usr0/aria/RuleRefinement called work. Reinitiated the repository so that I could have everything under v0.0, but adding the RuleLearning files in the right way and then commiting to the changes. Now everything is in avenue:/usr0/aria/RuleRefinement/CVS-Repository/v0.0 and my working directoy is avenue:/usr0/aria/RuleRefinement/work/v0.0 -> import a file and then commit pushd /path popd / path - Looking at TrRule.hpp... first need to modify the 2nd SetConstraint function, which is the one used in RR.cpp to have the same format in as out, right now, it takes constraints of the form: (x0 num p) and it outputs them as ((x0 num) = p) Also, need to write a SetConstraint function w/o category, so that what comes in is exactly what gets added to the rule. in Constraint.hpp: int Category; //1=parsing,2=transfer,3=generation,4=featurefilling/constrchecking Kathrin modifies the constraint inputed in a way that when you enter (X3 NUM = X1 NUM) (i tried with any of the 4 categories), it outputs (X3 = NUM)! - the key to a constraint is now X0__VAL__ETC, might want to change it to be the real string - need to set the SetCSet tp be a string with the training set name + id, not just the id number - need to write a mirror EraseConstraint fnc which given a string does the right thing - write a Constructor ParseRule which given a rule (string?), it creates a rule of the class TrRule.cpp (i.e. stores everything in the right place). first make sure everything is stored the way I want it to be stored. -> created todo file in v0.0 August 11, 2004 - met with Kathrin to start the C++ skeleton for RR avenue:/usr0/aria/RuleRefinement main: RuleRefinement.cpp classes: CTL.hpp ParseTree.hpp (K's previous code) Lexicon.hpp (K's previous code) August 12, 2004 - created avenue:/usr0/aria/MTEvaluation/bin check README to see where I got some of the scripts from Monday, August 30, 2004 - further specifying research plan - copied some code from K's to be able to run the xfer engine from C++ -> Makefile + ProduceLattice.cpp (avenue:/usr0/aria/RuleRefinement/K-code) - created avenue:/usr0/aria/eng2spa/corpus/input-simulation with: I see the red car I saw the woman Gaudi is a great artist Wednesday, November 21, 2005 - met with Freddie to talk about the overall structure and data flow Thursday, December 1, 2005 - met with Jaime and Alon: will work on CI class and will write down concrete steps that need to happen in my program in order to add a contraint to a lexical entry and to a rule (acting as an oracle) Friday December 2, 2005 - CTL.hpp -> CorrectionInstance.hpp this class will just map what's in the TCTool log file, just store relevant data in a useful day, no manipulation of the data -> use structs (+union) replaced all the instances of CTL in CI.hpp and in RR.cpp - added constructor for CorrectionInstance class: // constructor, instanciates all the relevant variables from the log file void StoreTCToolLogFile(string LogFileName); - testing it in Test.cpp debugged code in CI.hpp and Rule.hpp, needed to copy over StringUtils.hpp from K's dir -> need to debug the Rule.hpp file more thoroughly (look at TrRule.hpp from , much more extense) for now, not include in the Test file make Test.o make Test.exe ./Test.exe need to debug now Sunday, December 4, 2005 ... working on CorrectionInstance.hpp (Freddie helped me to debug my code to read in a file) Tuesday, December 13, 2005 - met with Erik about the interdace with the Xfer engine from my C++ code. stored it into CallXferEngine.cpp: #include "transfer.hpp" vector translations; TransferEngine *xfer = new TransferEngine(); xfer->initFromFile(initfile); // same as usual without "quit" at then end! xfer->processCommand("loadgra ..."); // ... // Erik will add a trace to this method, so that I can access that as well if (xfer->translate("Some Language Sentence", translations) > 0) { } // need to clearall before loading a new grammar/lexicon xfer->processCommand("clearall"); Code is in transfer/stable-linux-2/ * doc.txt has all the internal methods that I can call if I need the commands to return something (like for example, the num of rules loaded) * Makefile: look at the first line to see what object files my program needs to call, and ask Erik if it doesn't work. He took a long time to get it right and he anticipates me needing help with this. * transfermain.cpp * transfersupport.cpp (Erik wrote it for Kathrin) - Looking at avenue:/usr0/aria/eng2spa to see what initfile I want to load -> will need to load grammar and lexicon separately. - created an init file in /usr0/aria/eng2spa/auto-init which is meant to be loaded by the RR module, and thus it doesn't load any specific grammar or lexicon and it does not end with "quit". - working on Makefile to compile CallXferEngine, got stuck, sent email to Erik - met with Erik and debugged Makefile and CallXferEngine, it's running now and it does the right thing :-) Using most updated version of the files right now (/Transfer), since stable-linux2 is out of date, will change once he updates it. [aria@avenue RuleRefinement]$make CallXfer [aria@avenue RuleRefinement]$ ./CallXferEngine Turning on Latin-1 mode Setting normalizecase to UPPER-CASE Setting find all translations to ON Setting output source text to ON Setting showtrace to full trace with src indices Loading rule file /usr0/aria/eng2spa/grammars/grammar3.trf with 41 rules added Loading lexicon file /usr0/aria/eng2spa/lexicons/lexicon3.trf with 451 lexical entries added these are the alterative translations for I saw the red car: VI EL AUTO ROJO YO VI EL AUTO ROJO Deleting all loaded rules. clearing all the files loaded to the transfer engine - updating main so that it reflects the changes to the CorrectionInstance interface (actions and errors stored differently now). - emailed Freddie: is there a most up-to-date CI class? Nope, but he wants to work on it soon. Wednesday, December 14, 2005 - debugging and looking at Rule.hpp... maybe I should try to use K's Rule class instead (TrRule.hpp), it looks like it has many of the methods i need. cp RuleRefinement.cpp RuleRefinement-old.cpp changed Rule by TrRule - need to test the methods in main, using old CI for now: cp CorrectionInstance-old.hpp CorrectionInstance.hpp ... never mind, the old version is gone... working on the most updated version, which I sent to Freddie -> adapting code in main. -> realized I should probably simplify all the structs and union of structs ... by just having classes so that it's easier for me to access everything. And I should probably work with strings as words instead of a struct... need to: - working on adding the code to main for a end-to-end simulation Thursday, December 15, 2005 - briefly met with Freddie to talk about embedded structs and unions i think it's way too complex... he finally agreed :-) Friday, December 16, 2005 - copied struct/union version of CorrectionInstance and RuleRefinement.cpp to old-attempts, so that I can simplify my code. I didn't manage to fully debug it, and for some reason it never got inside the if loop in line 111 ( if (MyAction.type == "edit") ), so I decided to leave it at that. - changing Action from struct to class, leaving Word as struct for now, will probably need to move it to a Utils class that gets included to most files. Saturday, December 17 - can't use Visual C++ without major changes, working code under unix gives me tones of compiling errors when I try to compile it from Visual C++ :((( Sunday, December 18 - copied everything on cygwin and made all the necessary changes to compile it and have it working locally Monday, December 19, 2005 - managed to modify a rule from RuleRefinement and load it back to the grammar :-) now I only need to plug in the Xfer engine and I'll have the end-to-end super hacky system :-) - realized I can't run CallXferEngine locally... needs to look at all the files for GENKIT, TRANSFER, MORPHOLOGY,... including object and some header files look in Makefile Tuesday, December 20, 2005 - Need to remember it's [aria@avenue RuleRefinement]$ make CallXfer // and not make CallXferEngine!!! and then: ./CallXferEngine - modified the Makefile so that RuleRefinement now also contains all the paths to call and run the Xfer engine, but now since the Lexicon class is also defined by GenKit (UKernel), which Ben wrote, I get tones of compiling errors: [aria@avenue RuleRefinement]$ make RR /usr/local/bin/g++ -g -ftemplate-depth-24 -Wno-non-virtual-dtor -DNDEBUG -o RuleRefinement.o -c RuleRefinement.cpp -I/temuco/usr5/shared/code/antlr/antlr-2.7.1/lib/cpp -I/afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/stable-linux2 -I/shared/Genkit/UKernel -I/shared/Genkit/Toolbox In file included from ParseTree.hpp:9, from CorrectionInstance.hpp:8, from RuleRefinement.cpp:9: Lexicon.hpp:14: syntax error before `{' token Lexicon.hpp:29: `string' was not declared in this scope Lexicon.hpp:29: syntax error before `(' token Lexicon.hpp:30: `string' was not declared in this scope Lexicon.hpp:30: syntax error before `(' token Lexicon.hpp:31: syntax error before `(' token Lexicon.hpp:33: `string' was not declared in this scope Lexicon.hpp:33: syntax error before `)' token ... Tokenizer.hpp:38: instantiated from here /usr/local/lib/gcc-lib/../../include/c++/3.2.3/bits/basic_string.h:341: `__s' undeclared (first use this function) /usr/local/lib/gcc-lib/../../include/c++/3.2.3/bits/stl_vector.h: In copy constructor `std::vector<_Tp, _Alloc>::vector(const std::vector<_Tp, _Alloc>&) [with _Tp = std::string, _Alloc = std::allocator]': Tokenizer.hpp:42: instantiated from here /usr/local/lib/gcc-lib/../../include/c++/3.2.3/bits/stl_vector.h:346: ` uninitialized_copy' undeclared (first use this function) /usr/local/lib/gcc-lib/../../include/c++/3.2.3/fstream:358: confused by earlier errors, bailing out make: *** [RuleRefinement.o] Error 1 need to figure out how to set the scope of my code so that it does the right thing. Erik suggested using using namespace MyLex { // My code defining (and maybe using) Lexicon goes here } but that doesn't seem to work either. - went to lexicon.hpp and .cpp in Ben's code /usr2/shared/Genkit/UKernel to see how the namespace is used and did the same in my code. In Lexicon.hpp: #ifnded ... #define ... namespace MyLex { // code here }; // end of MyLex scope #endif In RuleRefinement:: MyLex::Lexicon::WhateverMethodINeed and it compiled!!! :-) Now there seems to be a problem with the grammar file... the outer parens are missing... [aria@avenue RuleRefinement]$ make RR /usr/local/bin/g++ -g -ftemplate-depth-24 -Wno-non-virtual-dtor -DNDEBUG -o RuleRefinement.o -c RuleRefinement.cpp -I/temuco/usr5/shared/code/antlr/antlr-2.7.1/lib/cpp -I/afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/stable-linux2 -I/shared/Genkit/UKernel -I/shared/Genkit/Toolbox /usr/local/bin/g++ -g -ftemplate-depth-24 -Wno-non-virtual-dtor -DNDEBUG -o RuleRefinement RuleRefinement.o /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/transfer.o /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/transfer-support.o /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/TransferGrammarLexer.o /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/TransferGrammarParser.o /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/UnicodeTools.o /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/chinese.o /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/english.o /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/FStructLexer.o /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/FStructParser.o /shared/Genkit/Toolbox/*.o /shared/Genkit/UKernel/*.o -L/temuco/shared/code/antlr-2.7.5/lib/cpp/src -lantlr [aria@avenue RuleRefinement]$ ./RuleRefinement ... SLSentence is I see the red car Cannot open init file /cygdrive/c/mt/eng2spa/auto-init-simulation.txt command is loadgra /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED.trf Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED.trf with 11 rules added no translations found by the Xfer engine Deleting all loaded rules. clearing all the files loaded to the transfer engine - changed path... it doesn't translate the SL sentence... looked into it: Before running RR module [aria@avenue eng2spa]$ transfer -if init-simulation.txt Turning on Latin-1 mode Setting normalizecase to UPPER-CASE Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar.trf with 5 rules added Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon.trf with 7 rules added Setting find all translations to ON Setting output source text to ON Setting showtrace to full trace with src indices 0: VEO EL AUTO ROJA 1: VEO LA AUTO ROJA 2: VEO EL AUTO ROJO 3: VEO LA AUTO ROJO After running RR module [aria@avenue eng2spa]$ transfer -if init-simulation.txt Turning on Latin-1 mode Setting normalizecase to UPPER-CASE Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED.trf with 11 rules added Setting find all translations to ON Setting output source text to ON Setting showtrace to full trace with src indices * No complete translation found LEX 0 GRA 5 UNK 0 MORPH 0 COMP 0 VEO EL AUTO ROJO tree: <((S,0 (VP,1 (V,1:2 "VEO") ) ) )> <((NP,8 (DET,1:3 "EL") (N,1:5 "AUTO") (ADJ,2:4 "ROJO") ) )> [aria@avenue eng2spa]$ But when I call auto-init.txt, which has the same parameters, it doesn't translate it: SLSentence is |I see the red car| Turning on Latin-1 mode Setting normalizecase to UPPER-CASE Setting find all translations to ON Setting output source text to ON Setting showtrace to full trace with src indices command is loadgra /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED.trf Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED.trf with 11 rules added no translations found by the Xfer engine Deleting all loaded rules. clearing all the files loaded to the transfer engine Erik: - it's not a full parse!!! ->translate only outputs anything if it finds a full parse, to get partial parse info, need to use bestpartial() (returns a string). added: string partialparse = xfer->bestpartial(); or if using a language model: bestpartiallm(); - Erik told me there is a Spanish LM and all I need to do is add this line to the init file: uselm /temuco/shared/data/Spanish/SpanishLM.index /temuco/shared/data/Spanish/SpanishLM.srilm Erik doesn't remember which data he used to build this, but he says it's not too big and he thinks he used some of the same data EBMT uses... [aria@avenue RuleRefinement]$ ./CallXferEngine Turning on Latin-1 mode Setting normalizecase to UPPER-CASE Setting find all translations to ON Setting output source text to ON Setting showtrace to full trace with src indices Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED.trf with 11 rules added LEX 0 GRA 5 UNK 0 MORPH 0 COMP 0 VEO EL AUTO ROJO tree: <((S,0 (VP,1 (V,1:2 'VEO') ) ) )> <((NP,8 (DET,1:3 'EL') (N,1:5 'AUTO') (ADJ,2:4 'ROJO') ) )> Deleting all loaded rules. clearing all the files loaded to the transfer engine RR: (...) SLSentence is |I see the red car| initfile is /usr0/aria/eng2spa/auto-init.txt Turning on Latin-1 mode Setting normalizecase to UPPER-CASE Setting find all translations to ON Setting output source text to ON Setting showtrace to full trace with src indices Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED.trf with 11 rules added LEX 0 GRA 5 UNK 0 MORPH 0 COMP 0 No full parse was found! Partial parse is: VEO EL AUTO ROJO tree: <((S,0 (VP,1 (V,1:2 'VEO') ) ) )> <((NP,8 (DET,1:3 'EL') (N,1:5 'AUTO') (ADJ,2:4 'ROJO') ) )> Deleting all loaded rules. clearing all the files loaded to the transfer engine - For some reason the 2nd call to LoadLexicon to load the Grammar is not loading all the rules in the grammar file (simulation-grammar-REFINED.trf), VP,3 is missing, which is causing the MT system to not find a full parse :-((( need to look into this... check K's lexicon.hpp in V0.3 first, maybe she has an improved version... Wednesday, December 21, 2005 - looking at grammar and lexicon files: it looks like it didn't load the last rule from the simulation-grammar, but it did load all the rules in the lexicon... - looking at LoadLexicon method: according to the implementation, the only way it doesn't add a rule is if it has the same id as another rule already in the lexicon why doesn't it output "sCombWords:" for the grammar rules? K's initial implementation, stores the lexicon in a different way, for each access method... -> see if it's possible to store it in one general way, and then move the processing to the methods (what's more efficient?) - looked at K's Lexicon.hpp in V0.3 and it's much simpler, just one access method: static set Lookup(string SLWord, string TLWord); - tried to add a print method to it and adapt it to map > but didn't manage to make it work... - looking at LoadLexicon method again, still not sure what if (mssMasterLexicon.find(sID) == mssMasterLexicon.end()){ mssMasterLexicon.insert(map::value_type(sID,AccumRule)); is doing, the rule hasn't been processed at this point, and this is the only place where it's inserted to the MasterLexicon, which is what gets printed... but it doesn't seem to be getting there... doesn't print the debugging statements :( - copied Lexicon.hpp into Grammar.hpp, I moved what I had started working on to Grammar-ari.hpp - after some debugging, it's now working again, but the same problem still occurs. -> moved VP,3 before VP,1 and tried again: it's working now!!! getting: SLSentence is |I see the red car| initfile is /usr0/aria/eng2spa/auto-init.txt Turning on Latin-1 mode Setting normalizecase to UPPER-CASE Setting find all translations to ON Setting output source text to ON Setting showtrace to full trace with src indices Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED.trf with 5 rules added Loading lexicon file /usr0/aria/eng2spa/lexicons/simulation-lexicon-REFINED.trf with 7 lexical entries added these are the alterative translations for I see the red car: VEO EL AUTO ROJO VEO LA AUTO ROJO Deleting all loaded rules. clearing all the files loaded to the transfer engine - NEED TO BE CAREFUL WITH DATA INCONSISTENCY!!! because I have the data stored in 3 different ways, so that I can access it in three different ways (need to make sure this is what I really need), now when I replace a rule, it's only replaced in the MasterLexicon, I think, but not in the other Data structures - cleaned all the "cout <<" that are not necessary to illustrate the process and saved the code: cp RuleRefinement.cpp RuleRefinement.cpp.simul-end-to-end ./RuleRefinement > RR-simul-end-to-end.txt - started working on adding a new lexical entry (end of RR.cpp)... but LoadLexicon doesn't seem to work the second time I call it, eventhough it used to work when I was using the same class to load the grammar, need to debug! Thursday, January 12, 2006 - working on NextStepsAlgorithm-Jan12.doc (Jaime wants it for next Tuesday morning) - created 00-End2End.txt so that I have a record of what scripts and files need to be run/created in order to run the RuleRefinement module from end-to-end Tuesday, January 17, 2006 - went over NextStepsAlgorithm-Jan17.doc with Jaime, and he agreed I should start right away with the backend tasks - also see 00-Add2Thesis for an expansion of Figure 6, keep in mind while implementing!!! Wednesday, January 18, 2006 - created 2 directories V0.01 and V0.02 and copied all the code files to both of the directories - leaving V0.01 unchanged, working on V0.02 for the next iteration... - working on task 1: copied code from the end and debugging... doesn't seem to be printing out the first cout <<... already delted .o file, but is still not showing the print out... weird! what could be preventing it from printing?? Thursday, January 19, 2006 - the problem was that my Makefile was creating an executable that does not have .exe at the end, but was just called RuleRefinement, and I kept trying to run RuleRefinement.exe which was some old version from December, duh!!! - modified the Makefile: make RR will now create a file called: RuleRefinement.exe and RRclean should first remove the object file and then create the exe file also added RRnoXfer to the Makefile, now just need to add a XferON flag to main... need to debug, getting lots of errors... also modified Makefile in V0.01 - back to the lattices... - cleaned code in main() a bit - implemented task 1, however, it's not true that if CTLS = TLS_n, then we want to do nothing... one of the tasks of the RR module is to make the grammar tighter, for example for "I see the red car", and so assume all the sentences are of that kind for now. - reporting if the incorrect translation has not been generated by the refined system at the end - backed up my code to AVENUE afs: copied V0.02 to /afs/cs.cmu.edu/project/avenue-1/Avenue/RuleRefinement and moved all the old file to V0.01 Friday, January 20, 2006 - met with Christine about potential help, but we decided it wasn't a good match due to her not knowing C++ already - trying to figure out how to implement a destructor for the Lexicon class looks like clear() from STD should do the trick - implemented destructor for the lexicon: DeleteLexicon() { member.clear(); } Task 5 done :-) - looking into task 7 -> I should probably reimplement the lexicon class, using the rule class. Maybe leave for later. -> debug Lexicon::Print - extracting POS from parse - implemented a toupper for a string instead of a char (allcaps), which is working in main - moved allcaps to Utils.hpp -> need to debug Utils, for some reason, I'm getting some errors in Rule as well... Monday, January 23, 2006 - debugged AllCaps, was including the wrong Rule file and it needed to have "static" before its declaration! - moving on... parsing parse.tree for POS (can't extract just POS... ask Stephan) - to concatenate strings (=append), use strcat(S1,S2), which produces S1S2 - creating lexical entry: maybe I can add a method later that does this 4 INparams: POS, nextID(extract from lexicon), tlword, slword -> need to keep a counter for each POS, need to add next available ID calculate this that when I load in the lexicon... // making it up for now - testing 2: adding a sense, for some reason it doesn't find "plays" in the lexicon, need to debug Tuesday, January 24, 2006 - the problem with translate(sl,trace) was that I had a transfer.hpp file in my local directory which was the one being called, instead of the one specified by the TRASNFERPATH, I have no idea why... but hey, it's working now - met with Stephan to discuss implementation issues - met with Stephan to discuss implementation issues: * REUSING EXISTING CODE: ask Erik if I can hace direct access to the lexicon and grammar classes so that I can modify them once they are loaded into the xfer engine, and so that I don't have to reload it every time, this would be very time consuming for very large grammars or lexicons (~10K words). If I have a base class from Erik, I can then derive an Extended Grammar class from that, that adds on to it with the functionality I need for the RR. Two different "save" methods, one that saves the basic info, necessary to run the Xfer engine, and one that saves all the info required for RR tracking, history of rules, etc. Is Erik using a library (antlr) ? his own library? is it stable enought that I can use it? Grammar: ExtendedGrammar vector MyGra flag {active,inactive} vector History save(); save(); ExtendedGrammar *pMyExtGra = new ExtendedGrammar; MyExtGra->save(); // saves all the information, including the base information // and the RR-related info (history, active v. inactive, etc.) (Grammar MyExtGra)->save(); // can cast it so that I use the base class save method, // to save just the basic info that the Xfer engine needs // not sure if this is the exact syntax * TREE: the best thing to do to extract the POS from the trace is to have a tree class or DS which I can load the tree into and then I can look it up in any way I need to (I'll also need to do this for blame assignment) Convert string into a tree structure. * STATIC variables: Current Lexicon and Grammar class (from K) are declaring them as a static variable, which essentially makes them globally accessible (by using Lexicon:: or Grammar::), and so there isn't an instance called Lexicon/Grammar. I think that this is probably fine for utility methods such as AllCaps, don't really want to have to create a Utils object, but for something like the lexicon or the grammar, I think it would make more sense to have a constructor that instantiates the right thing in the object, and so I'd have Lexicon.load("LexFileName") or something like that. A good use of static variables could be the POS id counter that the grammar and lexicon need to keep, which should be uniquely accessible from all the rules/entries at any point. And thus for the grammar class, I could have an enum, for example, with NPcount, VPcount, etc. which would tell me what is the next available Rule ID for each POS. * using memory in HEAP instead of STACK (and the use of pointers): class CLexicon { public: CLexicon(); //defalut constructor CLexicon(string FileName); // constructor that actually loads in a lexicon = LoadLexicon } -------- // this way to create an object from a class, stores it into the heap // instead of the stack, which is limited, and so this is a good way // to avoid memory leaks // Erik does it like this for the Xfer engine, since the Xfer engine // needs a lot of memory CLexicon *pLexicon = new CLexicon("lexicon1.trf"); pLexicon->LookupByID("ID"); // = (*pLexicon).LookupByID this is derrefering the pointer first // This would create an object and store it into the stack CLexicon MyLexicon; MyLexicon.LookupByID("ID"); * NOTATION: s_name = static variable m_name = member variable (to distinguish between local variables and member variables when within a method) Cname = class object pname = pointer * RULEREFINER CLASS: think about what data members and methods it needs (grammar, lexicon, correction instance, parse tree, etc.) this would have all the operation types defined in methods, nice way to organize code + keep track of what's missing * PARAMETERS: method(in,in, in/out) vs. out method(in, in) in/out params are useful for cases when we want to modify something, for example populate a list, that already contained some infor, or expand a grammar/rule. This is the way to modify complex objects, so that I don't have to copy big objects over and over. * STRING operations are expensive, so try passing by reference, and by const reference if I know I don't mean to modify them (ever). Example of bad use: string& method(...) { string NS; return NS; } NS is a local variable, and so when the function ends, its value is lost (out of scope), and so it doesn't have a reference. an out param should not be by reference, unless it has been declared outside the function. string method(...) or string NS string& method(...) { modify NS; return NS;} * CALLING A PERL script from C++: using pstream see TextFilter dir in bin/ for an example of how to call an external perl scripts from C++ . see: avenue:/usr0/aria/RuleRefinement/bin/TextFilter/README Wednesday, January 25, 2006 - met with Jaime and Alon Jaime wants me to test case one with a different type of agreement constraint (between the subj and the verb, say) Time efficiency issues: - when having a huge lexicon, instead of loading the whole file once it's modified, I can do two things: - load only the entries used in the sentence(s), creating a small working lexicon (Jaime) - use the reloadgra reloadlex command instead of the loadgra command, which also takes a file name, and the argument should be a grammar or a lexicon, just with the changes. So for example, if a rule has been slightly modified and needs to be replaced, it's added to the grammar file with the original ID, and if a new rule needs to be added, a new ID is created and the rule is added to the modified grammar. What the reload method does is to check if the ID is already there, if it is, it replaces it with the newest version, if it's not, it adds it. -> need to test - working on testing 1st case (adding an agreement constraint) but between the subject and the verb... added example int variable (1,2...) and expanding grammar and lexicon as well as running the xfer engine to obtain exact output. saved system's output in debugging-subj-obj-example and made a copy of the executable so that if something breaks later, I can show Jaime and Alon: cp RuleRefinement.exe RuleRefinement-ex2.exe - testing 2: adding a sense, for some reason it doesn't find "plays" in the lexicon, need to debug -> LoadLexicon is not actually populating SLLexicon! need to modify the method Thursday, January 26, 2006 - working on modifying the LoadLexicon method to actually populate the SLLexicon as well... - started working on creating static POSCounters to be able to create new lexical entries and grammatical rules. Stuck when trying to cast a string into an integer, emailed Stephan asking for help. check: atoi, atol strtol - Lexicon.hpp: Lookup methods don't seem to be working neither for SLword nor for TLword... fully debugging them, making sure I understand how they are loaded and then testing by printing out the second value of the map > in LookupSL: it's not going thru the if statement, but this is the same if statement that works fine for LoadLexicon... added cout statements and it looks fine... Friday, January 27, 2006 - still debuggin LookupSL method... - Stephan found the bug: I was looking up slword instead of SLWORD, duh!!! he made a bunch of suggestions, need to look into it and modify my code. Saturday, January 28, 2006 - fixed bug in main so that the right word is looked up :-) - finished adding sense case :-))))) Sunday, January 29, 2006 - debugged case 1 (adding a bran new lex entry, problem with the parens) Monday, January 30, 2006 - now both lexical entry examples are working and I am printing a file with just that entry to the lexicon directory - use reloadgra to add to lexicon already loaded in the xfer engine Note: reloadlex is not implemented yet, and as long as I use loadgra lexfile the first time, I should have no problems using reloadgra newentry. For huge lexicons, it might get too slow -> let Erik know. - Wanted to implement a lexical filter that only loads the lexical entries for the words that appear in the SLsentences, need to extract SLSs first and then implement a special loading function that does a lookup first, but Erik says that his code already effectively does that, and it only loads the entries that are needed... so I don't have to worry about that for the Xfer engine, maybe just for my code. - Lexical examples 2 and 3 are now working (cases 1 and 2) 2: I see the red unicorn -> * veo el unicorn rojo -> veo el unicornio rojo 3: Mary plays the guitar -> * María juega la guitarra -> María toca la guitarra - wrapping up: compiled and run it to make sure the trace looks good to show Jaime and Alon on wednesday *********************************** ***** MOVING TO V0.03... ********** - reorganize the logic of the program, checking against lattice before loading lexicon, etc. - splitting CorrectionInstance into hpp and cpp (otherwise the compiler needs to go over the whole implementation every time, even if it hasn't changed!) - split all other classes, so that I could get all the dependencies in the Makefile right - worked on the Makefile with Stephan: added dependencies for each class, so that the make file knows which files it needs to update for each object. It's compiling and running as before!!! :-) - to compile classes one by one: g++ -c ClassName.cpp Tuesday, January 31, 2006 - updated V0.02 into Avenue afs directory and backed up V0.03 - copied Stephan's SimpleTests.cpp into String2Int.cpp, this illustrates how to cast strings to integers (-> POScounters) - using Test.cpp for testing my code ************************************ - created a new object: Refiner ************************************ - Moving chunks of code into Refiner: -> need to debug Refiner + Makefile it can't find transfer.hpp Wednesday, February 1, 2006 - saved TestParam.tar file from Stephan in avenue:/usr0/aria/RuleRefinement/bin - managed to compile Refiner.o without transfer.hpp included still need to debug it to be able to include transfer.hpp (Makefile for Refiner.o looks like RuleRefinement.o, should work...) [aria@avenue V0.03]$ g++ -c Refiner.cpp In file included from Refiner.cpp:11: Refiner.hpp:14:24: transfer.hpp: No such file or directory Refiner.cpp:15:24: transfer.hpp: No such file or directory - working on making RuleRefinement more modular - met with Jaime and Alon, showed them the 4 examples that are working... Jaime wants me to show this to Lori some time. -> try loading the new grammar rule for example 1 (auto rojo) with reloadgra method Thursday, February 2, 2006 - changed GetCTLS into GetCTLSentence for consistency (CI.* and Refiner.cpp) - passing variables in and out methods... debugging - moved code from the beginnign of RR.cpp to: * map AccessLogInfo(CorrectionInstance* pCI); // instantiated data members // which I'm currently not using due to a variable access problem * vector DetectAction(CorrectionInstance* pCI, int example); * bool CTLSinXferLattice(string GramFileName, string LexFileName, string SLSentence, string CTLS); * bool CTLSinRefinedLattice(string RefinedGramFileName, string RefinedLexFileName, string SLSentence, string CTLS, string TLS); // try to make this more general so that it can be applied to both // refined and unrefined lex and grams -> have a flag for that // divide in smaller methods - but I can't seem to successfully include transfer.hpp into Refiner.cpp [aria@avenue V0.03]$ g++ -c Refiner.cpp Refiner.cpp:15:24: transfer.hpp: No such file or directory - commented it out for now, moving onto adding a parameter to RR.cpp, so that I don't need to recompile every time I need to test a different example. Looking at bin/TestPassParam/ (didn't have time) - In order to be able to use the same variables in the Refiner class, I can pass the object that contains those variables (CorrectionInstance) as a pointer to an initialization method, and that will actually store the pointer as a data member (m_pCI) into the Refiner class. Friday, February 3, 2006 - the include problem could be the following (Stephan): I need to provide an include path to the compiler, that it knows where to look for additional include files, esp. when they are not in the current directory. I have this include path set correctly in you Makefile, so calling make Refiner.o should work. However, if I just use g++ -c Refiner.cc then this include path is missing. I could add it with the -I option, but it is easier to use the makefile. did: make Refiner.o and it compiled... but then I was still getting: [aria@avenue V0.03]$ make RR ... s/cs.cmu.edu/project/avenue-1/Avenue/Transfer/stable-linux2/UnicodeTools.o /shared/Genkit/Toolbox/*.o /shared/Genkit/UKernel/*.o -L/temuco/shared/code/antlr-2.7.5/lib/cpp/src -lantlr /usr/bin/ld: Dwarf Error: Invalid or unhandled FORM value: 14. /usr/bin/ld: Dwarf Error: Invalid or unhandled FORM value: 14. /usr/bin/ld: Dwarf Error: Invalid or unhandled FORM value: 14. ... /usr/bin/ld: Dwarf Error: Could not find abbrev number 343. RuleRefinement.o: In function `main': RuleRefinement.o(.text+0x804): undefined reference to `Refiner::CTLSinXferLattice(std::basic_string, std::allocator >, std::basic_string, std::allocator >, std::basic_string, std::allocator >, std::basic_string, std::allocator >)' collect2: ld returned 1 exit status make: *** [RR] Error 1 What was really going on is that I forgot to add Refiner:: before the CTLSinXferLattice method!!! duh! - thinking about different methods required to manipulate rules (Lex and Gra) wrote a file called /usr0/aria/RuleRefinement/ManipulatingRules.txt added to NextSteps file as well - index constraints by the two y-positions by storing them in a matrix: vector>> Monday, February 5, 2006 - backed up files from V0.03 to Avenue afs directory - adding parameters to main (using TestParam class from Aachen) compiler error: it has to do with the including of ParamDef.hh... In file included from RuleRefinement.cpp:11: Param.hh:126:23: ParamDef.hh: No such file or directory [3:43:26 PM] Ariadna says: I've tried adding <> around the file name, and including ParamDef.hh from the RR.cpp, but I'm still getting error messages... [3:44:39 PM] Ariadna says: actually,if I just try make Param.o, I get tones of errors, even though I just copied the file from your Test directory... [3:45:44 PM] Ariadna says: [aria@avenue V0.03]$ make Param.o /usr/local/bin/g++ -g -ftemplate-depth-24 -Wno-non-virtual-dtor -DNDEBUG -o Param.o -c Param.cc Param.cc:14:20: Param.hh: No such file or directory Param.cc:23: syntax error before `::' token Param.cc:24: syntax error before `::' token [3:45:55 PM] Ariadna says: ... - replaced the <..> in the Param files by "..", i.e. #include "ParamDef.hh" The two different ways make a difference as to where the compiler searches. in STTK we use <..> everywhere, but the current directory is then included in the Include path. "..." means that the current directory is search, even if it is not in the Include path (well, this is how I think it is, but no guarantee given;-) So, when you have some time you might test to use the <...> but add to your include path -I.. Notice the fullstop after the I, which stands for current directory. <...> is usually only used for the system includes. - fixing the GramFileName params which are now pointers (Lexicon, Refiner, etc.) - stuck again with Makefile... for some weird reason it wasn't linking right and when Param.o and ParamDef.o were deleted from the OBJS_RR list and then added again, it did... weird!!! But not it's working: [aria@avenue V0.03]$ ./RuleRefinement.exe Compiled on Feb 6 2006 23:10:32 with g++ version: 3.2.3 in debug mode CParam.AddParamDef : Internal warning: the flag 'l' is reserved for special purposes or has been used for another parameter. It should not be redefined. BEWARE OF UNPREDICTIBLE BEHAVIOUR! Parameters are: Debug Level = 1 Lexicon File = /usr0/aria/eng2spa/lexicons/simulation-lexicon.trf Grammar File = /usr0/aria/eng2spa/grammars/simulation-grammar.trf TCTool Log File = /usr0/aria/RuleRefinement/IOFiles/SampleTCToolLogFile-processed Example number = 2 0 ... need to debug the rest of the program, make sure I'm passing what I need to. At some point I'm giving the xfer engine a cout comment, instead of the SLSentence, need to debug! Form output: ... ************************************************** 1. Checking against the existing lattice... ************************************************** initfile is /usr0/aria/eng2spa/auto-init.txt Turning on Latin-1 mode Setting normalizecase to UPPER-CASE Setting find all translations to ON Setting output source text to ON Setting showtrace to full trace with src indices ** NO FULL PARSE FOUND. DOING PARTIAL TRANSFER: LEX 0 GRA 0 UNK 0 MORPH 0 COMP 0 SLATIONS AND THEIR PARSES FOR tree: <(V,0:1 'SLATIONS')> <(V,1:2 'AND')> <(V,2:3 'THEIR')> <(V,3:4 'PARSES')> <(V,4:5 'FOR')> ** NO FULL PARSE FOUND. DOING PARTIAL TRANSFER: LEX 0 GRA 0 UNK 0 MORPH 0 COMP 0 AND THEIR PARSES FOR tree: <(V,1:1 'AND')> <(V,2:2 'THEIR')> <(V,3:3 'PARSES')> <(V,4:4 'FOR')> No parse found. LEX 0 GRA 0 UNK 0 MORPH 0 COMP 0 No full parse was found! Partial parse is: SHE READ tree: <(V,5:1 'SHE')> <(V,6:2 'READ')> Deleting all loaded rules. From MAIN: nope, it's NOT there :-) In LoadLexicon, LexiconFile is:|/usr0/aria/eng2spa/lexicons/simulation-lexicon.trf| FileName is /usr0/aria/eng2spa/lexicons/simulation-lexicon.trf Now looking up Wi LEÍ... ... need to debug this! Tuesday, February 7, 2006 - making a pointer to a CorrectionInstance a private member of the Refiner class so that I can access to all the information I need to. Also making it static and public for now... otherwise I need to write all the set/get methods for it again... - modfying V0.03 from local machine (avenue) and leaving Avenue afs version untouched for now as a back up. Wednesday, February, 2006 - debugging passing pointers as params to Refiner:: methods It turns out I need to declare the static private data member in Refiner.cpp again, with the class scope in front of it, otherwise, linker complains (undefined reference to `Refiner::m_pCI'), thus added this: CorrectionInstance* Refiner::m_pCI; it now compiles AND links and runs! -> now PassingCI as a pointer and made it static a private data member of the Refiner class so that I can access all the relevant info from Refiner and I don't need to be passing it back and forth -> need to fully debug!!! Make sure I only store what I need once! - static member variable means that all opjects share this member. In you case all RR objects share the same pointer to the correction instance. This does not really matter, as you will only have on RR object. But if you had multiple RR objects at the same time, they probably would have their individual CorrectionInstance pointer. So, I would suggest to remove the static. There is not difference in efficiency or ease of use. - with respect to accessing the member variables of the CI: what I typically do is to assign those variables to local variables the first time they enter the current object, in you case, when you give the CI to the RR object, you could have something like m_pSLSentence =pCI-> pSLSentence. You don't want to copy large data structures, use have a local pointer to it. The dereferencing operation will cost you no time, so it is only an issue of writing pCI-> over and over again, and a matter of syle, i.e. if you want to see in your code that this jokers belong to the CI object. - tested param class with different params * default params: [aria@avenue V0.03]$ ./RuleRefinement.exe Compiled on Feb 8 2006 16:16:07 with g++ version: 3.2.3 in debug mode Parameters are: Debug Level = 1 Lexicon File = /usr0/aria/eng2spa/lexicons/simulation-lexicon.trf Grammar File = /usr0/aria/eng2spa/grammars/simulation-grammar.trf TCTool Log File = /usr0/aria/RuleRefinement/IOFiles/SampleTCToolLogFile-processed Example number = 2 * different params for DebugLevel and Example: [aria@avenue V0.03]$ ./RuleRefinement.exe -d 0 -e 1 Compiled on Feb 8 2006 16:16:07 with g++ version: 3.2.3 in debug mode Parameters are: Debug Level = 0 Lexicon File = /usr0/aria/eng2spa/lexicons/simulation-lexicon.trf Grammar File = /usr0/aria/eng2spa/grammars/simulation-grammar.trf TCTool Log File = /usr0/aria/RuleRefinement/IOFiles/SampleTCToolLogFile-processed Example number = 1 * different log file (even though main is not doing anything with it yet) [aria@avenue V0.03]$ ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/SampleTCToolLogFile-GaudiExample Compiled on Feb 8 2006 16:16:07 with g++ version: 3.2.3 in debug mode Parameters are: Debug Level = 1 Lexicon File = /usr0/aria/eng2spa/lexicons/simulation-lexicon.trf Grammar File = /usr0/aria/eng2spa/grammars/simulation-grammar.trf TCTool Log File = /usr0/aria/RuleRefinement/IOFiles/SampleTCToolLogFile-GaudiExample Example number = 2 * lexicon [aria@avenue V0.03]$ ./RuleRefinement.exe -l /usr0/aria/eng2spa/lexicons/simulation-lexicon2.trf Compiled on Feb 8 2006 16:16:07 with g++ version: 3.2.3 in debug mode Parameters are: Debug Level = 1 Lexicon File = /usr0/aria/eng2spa/lexicons/simulation-lexicon2.trf Grammar File = /usr0/aria/eng2spa/grammars/simulation-grammar.trf TCTool Log File = /usr0/aria/RuleRefinement/IOFiles/SampleTCToolLogFile-processed Example number = 2 * grammar: [aria@avenue V0.03]$ ./RuleRefinement.exe -g /usr0/aria/eng2spa/grammars/simulation-grammar1.trf Compiled on Feb 8 2006 16:16:07 with g++ version: 3.2.3 in debug mode Parameters are: Debug Level = 1 Lexicon File = /usr0/aria/eng2spa/lexicons/simulation-lexicon.trf Grammar File = /usr0/aria/eng2spa/grammars/simulation-grammar1.trf TCTool Log File = /usr0/aria/RuleRefinement/IOFiles/SampleTCToolLogFile-processed Example number = 2 * Lexicon and Grammar: [aria@avenue V0.03]$ ./RuleRefinement.exe -l /usr0/aria/eng2spa/lexicons/simulation-lexicon2.trf -g /usr0/aria/eng2spa/grammars/simulation-grammar1.trf Compiled on Feb 8 2006 16:16:07 with g++ version: 3.2.3 in debug mode Parameters are: Debug Level = 1 Lexicon File = /usr0/aria/eng2spa/lexicons/simulation-lexicon2.trf Grammar File = /usr0/aria/eng2spa/grammars/simulation-grammar1.trf TCTool Log File = /usr0/aria/RuleRefinement/IOFiles/SampleTCToolLogFile-processed Example number = 2 * !it instantiates it even if the file doesn't exist [aria@avenue V0.03]$ ./RuleRefinement.exe -l /usr0/aria/eng2spa/lexicons/simulation-lexicon2.trf -g /usr0/aria/eng2spa/grammars/simulation-grammar2.trf Compiled on Feb 8 2006 16:16:07 with g++ version: 3.2.3 in debug mode Parameters are: Debug Level = 1 Lexicon File = /usr0/aria/eng2spa/lexicons/simulation-lexicon2.trf Grammar File = /usr0/aria/eng2spa/grammars/simulation-grammar2.trf TCTool Log File = /usr0/aria/RuleRefinement/IOFiles/SampleTCToolLogFile-processed Example number = 2 ... .In LoadGrammar, GrammarFile is:|/usr0/aria/eng2spa/grammars/simulation-grammar2.trf| Couldn't open grammar file (/usr0/aria/eng2spa/grammars/simulation-grammar2.trf) Thursday, February 10, 2006 - Moving lexical examples to CorrectionInstance::LoadTCToolLogFile 0. She read 1. I see the red car 2. I see the red unicorn 3. Mary plays the guitar - moved action and error info detection from Refiner::DetectAction to Refiner::InstantiateLogInfo(CorrectionInstance* pCI) - Refiner::AccessLogInfo -> InstntiateLogInfo Friday, February 10, 2005 - since now the method CorrectionInstance::LoadTCToolLogFile(char *pLogFileName) takes a pointer instead of a string, LogFile.open(LogFileName.c_str()); becomes just: LogFile.open(pLogFileName); since c-str is only used when you have a string, to access the internal char * buffer. - and when trying to append a string with a pointer to char *, I need to explicitely turn the string constant ".." into a string object: string Command1 = string ("loadgra ") + pGramFileName; xfer->processCommand(Command1); this actually fixed the problem with the translations, since it wasn't loading the grammar and lexicon at all and it was taking part of the cout and trying to parse it. ****************************************************************************** *** Dilema: experienced C++ programmer vs nice programmer vs no programmer *** ****************************************************************************** - met with Jaime, we'll try Bill first and then see... Alon will send him an email later tonight. Met with him on Monday, get some feedback by Wednesday. Then tell Marina (no). - when I try to make the string data members in Refiner into a string reference (string & SLSentence) and change GetSLSentence() in CorrectionInstance to also return a string reference, the compiler complains: ... RuleRefinement.cpp:106: no matching function for call to `Refiner::Refiner()' Refiner.hpp:32: candidates are: Refiner::Refiner(const Refiner&) ... -> leave this for later, or ask Bill. Saturday, February 11, 2006 - avenue went down (problem with the power supply) - saved a copy of the old afs backup in temuco:/usr4/aria/RuleRefinement Sunday, February 12, 2006 - looked at old CI.hpp (from afs backup), logfiles, perl script to postprocess those files. Monday, February, 13, 2006 - met with Bill: went over the system and in particular the RR module. Looked at some log files and the perl script. - emailed Bill with the most important pieces of information stored it into /usr0/aria/RuleRefinement/bill/email-1-intro - avenue is back up, it was the power supply cable (they replaced it, + fan) - creating log files for the examples: - the parse trace format changed from "" to '' around the lexical items - needed to modify postprocess-xfer.out.debug.pl so that it could extract the alignment info correctly again: ./bin/postprocess-xfer.out.debug.pl corpus/error-typology-simulation.out.debug >! input-tct - modified intro-test.cgi to load input-tct (instead of input-tct-good) !!!!!!! Currently intro-test.cgi picks the first 5 tl as stored in %$sl (intro(-test).cgi), which means that we need to apply the Spanish LM to input-tct before it's given to the TCTool, namely have the LM choose the 5 best translations form input-tct and pass only those on to the TCTool: $c = 0; $sl .= "|"; foreach $tl (keys(%$sl)){ if ($tl ne "con") { if ($c == 5) { last; } # displays 5 first translations only $c++; $al = $$sl{$tl}; $tlname = "tl" . "$c"; $tlnameal = $tlname . "-al"; $html .= "

              $tl "; } For now, I edited the input-tct file and only left the first 5 sentences, so those are the ones the TCTool will display. For the first example, however, I removed the correct sentence and included the 6th alternative translation, so that the correction makes more sense. Namley, even though the Xfer engine does produce the correct translation, the user doesn't see it, and in this case, the user correction can be used to tighten the grammar by adding an agreement contraint, which causes the incorrect sentence to not be produced by the system, after refinement. - finally managed to run all the example sentences thru the tctool, emailed Bill the link with some explanation (RR/bill/email-2-logfiles) - updated 00-End2End file in RuleRefinement - run processTCToolLogFiles.pl on some of the simple log files I just generated: bin/processTCToolLogFiles.pl IOFiles/2006-2-13-17-13-55-8336/0 sl = I see the red car tl = veo el auto roja al = ((1,1),(2,2),(3,3),(5,4),(4,5)) ctl = cal = ((2,(3,(4,(5) action = edit Wi = roja WiC = rojo still needs debugging: ctl and cal are incorrect!! rm 2006-2-13-17-13-55-8336/0-processed - created TestRuleManipulation.cpp to start testing and expanding rule manipulation operations, using TrRule and other classes (K's) Tuesday, Febrbuary 14, 2006 - getting linker errors for TrRule when compiling TestRuleManipulation it seems that even though it's able to create a string, it doesn't find the string destructor! (~string()). Looks like a template error. I doublechecked that namespeace std; is present in all the relevant files (TrRule.hpp, TestRuleManipulation.cpp) and I even add it to TrRule.cpp, just in case, even though I think having it in TrRule.hpp should be enough) Made sure that the string format that TrRule::SetTrRuleFromString is expecting is the one I'm passing it (\n is used as line separator). However I realized that the format of the TrRule has extra stuff in it {ADJP,4} ;;SL: B AWRK M@R ;;TL: A METER LONG ;;C-Structure:( ( (DET a-1)(N meter-2))(ADJ long-3)) ;;Score:1 ADJP::ADJP [\"B\" \"ARK\" N] -> [\"A\" N \"LONG\"] (... -> the problem was that I used different compilers for the different targets. Once, CPP and once GXX. GXX leads to older compiler version. So, replacing the $GXX in the TestRuleManipulation and TestRM targets did it. - Need to modify it to just parse the regular rule! doing that while I don't figure out the linker errors... modified TrRule::SetTrRuleFromString and TrRule::Print, need to debug, once I get it to link! - 4:34pm: Alon sent email to Bill with access to Avenue project machines - will need to add a method to access all the constraints between any two indices: // vector<&Constraint*> GetConstraintSet(int Ypos1, int Ypos2); // implement a method that given two indexes, it checks all // the constraints between them, so that if I want to add a cosntraint // between y1 and y2, say, first I make sure such constraint does not exist // for that it would probably be useful to have a matrix // vector>> // pos1 pos2 set of constraints - look at how Constraint is implemented so that I can see if it also forsees =c and NOT, OR constraints Here is the Constraint class data members I think I'll need: int ConstraintType; int EquationType; int FeatureType; ValueTupe Value; int Pos1; // redundant since I want to store them according to the 2 indices int Pos2; where ConstraintType = {agr, value} FeatureType = {tense, ender, feat_1, feat_2...} // problem: this is an infinite list... // these depend on the Featuretype ValueType = {sg, pl, masc, fem, past, +, -...} // but also *NOT* and *OR*, not sure how to encode that... EquationType = { =, =c, ...?} - Look at UseTrRule to help me test my rule manipulations - fixed the linker problem (see above), needed to change instanced of GXX into CPP in Makefile Wednesday, February 15, 2006 - Need to give each CI an id, if the log file has a unique id, then use that, otherwise create it, so that I can track refinements to the CI that originated them and also to be able to count how many users made the same change (which should lead to the same refinement). (tell Bill) Different ways to parse several sentences from the same user: 1. First, append them all to the first log file (-all), so that all the corrections one user made are in the same file, and then store each CI separately, or 2. parse each file at a time (0, 1, 2,...) and store it into a different CI Either way, I need to have a way to know which corrections were made by the same user and I need to have a way to know which corrections are about the same sentence pair and whether they are the same corrections or different, for a given SL-TL sentence pair. -> Need to index CIs by SL-TL sentences (just SLS is not enough, user could have picked a different translation to correct). And then when there is a log file that given the same SL-TL sentence, it has the same corrections, increase the user_counter in that CI, instead of creating a new CI. - We could also store the unique user IDs that made that correction in the relevant CI instance... vector? Note: log files from user studies already have a unique ID which can be extracted, but for log files generated by correcting Mapu MT output, a unique ID needs to be generated (2005-11-18-16:50:38-4983/log) -> Refinements should be labeled by SL-TL pairs that lead to that refinement, with the user_counter information as well, so that it's easy to see how much support there was for any given refinement. -> Store a collection of processed CIs and have a bool for whether it lead to a refinement or not (if not, we assume there was lack of support for that CI). If it actually lead to a refinement and it decreased the eval accuracy, that should be stored elsewhere (bool bad_instance, 0 by default, 1 if it decreases accuracy) (for me) -> In batch mode, accumulate CIs and counts for the CIs first, and then move on to the rule refinement process for those instances with 90% support, say (send this info in an email bill/email-3-CI) - Modified TrRule and tried to compile it by itself, getting similar errors than yesterday!!! arghhhhhhh no idea why TestRM compiles and links fine and directly depends on TrRule! ok, I need to say make TrRule.o and not make TrRule!!! since there is no target called TrRule and so make probably will use some default rules to create a target. - don't know why, but K's original TrRule file (TrRule-K.cpp) is not even compiling!! getting tones of errors :-/ Anyway... - met with Alon (Jaime wasn't there) - added CI processing issues to Thesis-updated.tex (**** in tex, red font in pdf) - Rule string format needs to be as follows: "{NP,8}\nNP::NP : [DET ADJ N] -> [DET N ADJ]\n(\n(X1::Y1)\n(X2::Y3)\n(X3::Y2)\n((x0 det) = x1)\n((x0 mod) = x2)\n(x0 = x3)\n(y0 = x0)\n(y1 == (y0 det))\n(y3 == (y0 mod))\n(y2 = y0))\n"; namely: {NP,8} NP::NP : [DET ADJ N] -> [DET N ADJ] ( (X1::Y1) (X2::Y3) (X3::Y2) ((x0 det) = x1) ((x0 mod) = x2) (x0 = x3) (y0 = x0) (y1 == (y0 det)) (y3 == (y0 mod)) (y2 = y0)) otherwise, TrRule::SetTrRuleFromString is not able to create a rule - commented out Print() in TrRule::SetConstraint(string ConstraintPassIn) so that the rule building process is not printed out at each step. - // need to also keep comments, so if it finds comments preceeding a rule // those are also considered part of the rule as it were // refined rules will have SL-TL info + user_counter info // however, original rules, will most likely not have this information // so, need to include it when it's there and not crash when it's not there - added ";; SL: I see the red car\n;; TL: veo el auto rojo\n;; Users = 2\n" to the beginning of the Rule string testing... weird, it complains when getting the constraints! [aria@avenue V0.03]$ ./TestRuleManipulation.exe ----------- Now printing Rule 1 (R1) ------------------ NP::NP [DET ADJ N] -> [DET N ADJ] ( (X1::Y1) (X2::Y3) (X3::Y2) ERROR in TrRule::GetConstraintsInPrintOrder. Can't find agreed index for constraint (Y0 = ) - modified code to find comments, now I just need to store them in the right way so that everything else doesn't get screwed up! [aria@avenue V0.03]$ ./TestRuleManipulation.exe The following comment was found: ;; SL: I see the red car The following comment was found: ;; TL: veo el auto rojo The following comment was found: ;; Users = 2 ----------- Now printing Rule 1 (R1) ------------------ NP::NP [DET ADJ N] -> [DET N ADJ] ( (X1::Y1) (X2::Y3) (X3::Y2) ERROR in TrRule::GetConstraintsInPrintOrder. Can't find agreed index for constraint (Y0 = ) Thursday, February 16, 2006 - the reason it was crashing was that the way TrRule::SetTrRuleFromString detects constaints is by loking for an = sign in the line... and one of my comments had an equal sign... if ((Lines[i].length() != 0)&&(Lines[i].find('=') != -1)){ //then it's gotta a be a constraint or else some sort of noise if (GlobalParams::DebuggingLevel == 1){ // cout << "Now setting constraint from string:" << Lines[i] << endl; } SetConstraint(Lines[i]); } -> changed it: ;; Users: 2\n and it's working well now - RankInCategory? From Kathrin: About the RankInCategory -- I don't think I ended up using it much, because it wasn't relevant to my code. But it had something to do with what order the constraints need to be in. You know how in Unificiation it matters what order they're in? I dont remember the details but at some point I wanted to be able to set that somehow. I guess -- let's see. If you have two constraints in the same category, like(hypothetically) ((X2 case) = acc) (X0 = X2) ((X0 case) = nom) then (X0 = X2) and ((X0 case) = nom) are in the same category (...right...? according to Erik's definition of categories...?) but you need to make sure that (X0 = X2) is set before ((X0 case) = nom), because the other way around would result in something different. My constraints never got complex enough for this to matter, but it might matter in your case ... -> since I will only add value constraints and agreement constraints with a specific feature, I don't think this will matter, but need to make sure at a later stage - looked at how to get the constraints I want, still figuring out exactly how the methods work, categories seem to be shifted... Friday, February 17, 2006 - met with Bill to talk about the CI class and to clarify what it needs to do -> history of alignments doesn't need to be stored explicitely in CI, but the indices of current words, do change if a word is added, say. - send him an email with a summary and new info which I forgot to tell him: email-3-CI Monday, February 20, 2006 - categories as specified by K's comments don't match her methods. From TestRuleManipulation.cpp: //Constraint.hpp://1=parsing,2=transfer,3=generation,4=featurefilling/constrchecking // see concrete examples from R1 modified below map > TrConstraint; cout << "---------- 3) Getting All Constraints of category 4\n"; //vector GetConstraints(int CategoryPassIn); //for a certain type (category) only vector VS = pTR->GetConstraints(4); for (int i = 0 ; i < VS.size() ; i++) { cout << VS[i] << endl; } // K's categories are not quite right... // GetConstraints(1); ouput: ok /* ((X0 DET) = X1) ((X0 MOD) = X2) (X0 = X3) ((X1 NUM) = (X3 NUM)) */ // GetConstraints(2); no ouput -> this should output (Y0 = X0) // GetConstraints(3); output:(Y0 = X0) (transfer constraint, should be of category 2...) // this should output: /* (Y1 == (Y0 DET) (Y2 == (Y0 DET) (Y2 = Y0) (Y3 == (Y0 MOD) */ // GetConstraints(4); output: /* ((Y1 NUM) = SG) (Y1 = (Y0 DET) (Y2 = (Y0 DET) (Y2 = Y0) ((Y3 AGRGEN) = (Y2 AGRGEN)) (Y3 = (Y0 MOD) */ // but should really be just: ((Y3 AGRGEN) = (Y2 AGRGEN)) // not sure it's worth fixing, does this matter to the RR module? // actually, it is good to have all the y-side constraints to be grouped under one category, ie. 4 /* so if I assume that: 1 is parsing (as intended) 3 is transfer 4 is generation (y-side) and 2 I ignore, I should be ok... */ cout << "---------- End of Getting All Constraints of category 4\n"; - finally I figured out how to pass it the rhs index, to create an agreement constraint: // 3rd SetConstraint method: void SetConstraint(int ConstrainedIndexPassIn, string FeaturePassIn, string FeatValuePassIn, string FeatAgrPassIn, int RankInCategoryPassIn, int CategoryPassIn); // string NewConstraint = "((y3 agr gen) = (y2 agr gen))"; pTR->SetConstraint(3, "agr gen", "empty", "Y2", -1, 4); output: NP::NP [DET ADJ N] -> [DET N ADJ] ( (X1::Y1) (X2::Y3) (X3::Y2) ((X0 DET) = X1) ((X0 MOD) = X2) ((Y1 NUM) = SG) (X0 = X3) ((X1 NUM) = (X3 NUM)) (X1 = Y1) (Y0 = X0) (Y1 = (Y0 DET) (Y2 = (Y0 DET) (Y2 = Y0) ((Y3 agr gen) = Y2 agr gen)) (Y3 = (Y0 MOD) ) - testing all the different SetConstraint methods on different constraint types, it's a mess, very inconsistent! in Constraint.cpp: string Constraint::ReturnAsString() const{ string ReturnString; //convert the index to a string char buf[10]; sprintf(buf,"%d",ConstrainedIndex); string IndexStr = buf; if (ValueConstraint == true){ if (Category == 1){ ReturnString = "((X"; } else{ ReturnString = "((Y"; } //then add the Feature, Index and the FeatValue ReturnString = ReturnString + IndexStr + " " + Feature + ") = " + FeatValue + ")"; } else{ //check if it's a simple agreement constraint if (Feature == "all"){ if (Category == 1){ ReturnString = "(X"; } else{ ReturnString = "(Y"; } ReturnString = ReturnString + IndexStr + " = " + FeatValue + ")"; } else{ if (Category == 1){ ReturnString = "((X"; } else{ ReturnString = "((Y"; } ReturnString = ReturnString + IndexStr + " " + Feature + ") = " + FeatValue + " " + Feature + "))"; } } //cout << "Returning from constraint:" << ReturnString << endl; return ReturnString; } so, that's where the (( is coming from and the missing paren! and should give a try to "all" instead of "empty" and "" else{ //check if it's a simple agreement constraint if (Feature == "all"){ Tuesday, February 21, 2006 - testing GetConstraints methods Let's say, we wanted to eliminate all the constraints pertaining to X1 and Y1 (because we were getting rid of X1 and Y1) [note: that then the remaining constraints' indices would need to be changed accordingly!] target: ((Y1 num) = sg) ((X1 num) = (X3 num)) (Y1 = X1) (Y1 = (Y0 DET) CategorySet.insert(1); CategorySet.insert(4); TrConstraint = pTR->GetConstraintsForInd(1,CategorySet); output: string (1arg in map): all first of pair: empty; second of pair: (Y0 DET string (1arg in map): num first of pair: empty; second of pair: (X3 CategorySet.insert(3); CategorySet.insert(4); TrConstraint = pTR->GetConstraintsForInd(1,CategorySet); output: string (1arg in map): all first of pair: empty; second of pair: X1 string (1arg in map): num first of pair: sg; second of pair: empty CategorySet.insert(1); CategorySet.insert(3); CategorySet.insert(4); string (1arg in map): all first of pair: empty; second of pair: X1 string (1arg in map): num first of pair: empty; second of pair: (X3 this only retrieves: (Y1 = X1) ((X1 num) = (X3 num)) !!! not sure why this map is an incomplete set of all the constraints of index 1 and of Category 1, 3 and 4!! Assing Category 2 doesn't make any difference... In any case, I can't trust these method to retireve all the constraints for an index - testing EraseConstraint... // EraseConstraint manipulates: map but it looks like it only has been implemented for // simple value constraints - finished testing K's methods - writting ConstraintClassSpecs.txt to figure out what I really need and what would be an overkill - fixing rule format in TrRule::SetTrRuleFromString Need to extract RuleID, however, only RuleIndex can be stored... Added string POS as a data member and added code both to SetTrRuleFromString and Print() methods Thursday, February 23, 2006 - finished adding code to store and print out RuleID - Need to fix bugs for when a rule is set from a String (pTR->SetTrRuleFromString(R1)), it looks like the SetConstraint method needs to be implemented more robustly so that the constraints have the right number of parents -> RuleChecker! // missing parens! -> stderr << illformed constraint or something like that // looks like this might be coming from the SetConstraint class (Y1 = (Y0 DET) (Y2 = Y0) ((Y3 AGRGEN) = (Y2 AGRGEN)) (Y3 = (Y0 MOD) - look at my old prototype code to see if I'm forgetting any actions for the Constraint class Friday, February 24, 2006 - working on Constraint class and methods that modify rules - Bill: he implemented the CI class to read simple log files, and is taking the alignment info from the header. But since log files will not always have headers, he needs to extract it from the parse tree. I asked him to also write a test class, as a way to debug his class and have example usage of hoe to call his methods. He's going to send me his code with a test class before 5pm. !!! Realized there isn't a real nice way to extract clue words from full version of TCTool...: [aria@avenue IOFiles]$ grep -r "Reason" * 2004-2-21-10:58:29-4283/4:* Reason: none-of-the-above(arriba) 2004-2-21-10:58:29-4283/4:* Reason: none-of-the-above(almenys en castellà 2004-2-21-10:58:29-4283/5:* Reason: none-of-the-above(same argument structure, b ut 2004-2-21-10:58:29-4283/6:* Reason: none-of-the-above(acc. pronoun) 2004-2-21-10:58:29-4283/7:* Reason: wrong-gender() 2004-2-21-10:58:29-4283/7:* Reason: none-of-the-above(gen.-pronoun) 2004-2-21-10:58:29-4283/8:* Reason: none-of-the-above(acc.- pronoun) 2004-2-21-10:58:29-4283/9:* Reason: wrong-gender() 2004-2-21-10:58:29-4283/10:* Reason: wrong-person(subject) 2004-2-21-10:58:29-4283/10:* Reason: none-of-the-above(plural commitative) 2004-2-21-10:58:29-4283/13:* Reason: different-sense 2004-2-21-10:58:29-4283/15:* Reason: none-of-the-above(completiva d'infinitiu) 2004-2-21-10:58:29-4283/16:* Reason: none-of-the-above(jugar needs prep.) 2004-2-21-10:58:29-4283/17:* Reason: different-sense 2004-2-21-10:58:29-4283/18:* Reason: wrong-tense 2004-2-21-10:58:29-4283/18:* Reason: wrong-number() 2004-2-21-10:58:29-4283/21:* Reason: wrong-tense 2004-2-21-10:58:29-4283/21:* Reason: wrong-person(subject (él)) 2004-2-21-10:58:29-4283/23:* Reason: wrong-gender() 2004-2-21-10:58:29-4283/23:* Reason: different-sense 2004-2-21-10:58:29-4283/24:* Reason: didnt-translate-word() 2004-2-21-10:58:29-4283/25:* Reason: wrong-tense 2004-2-21-10:58:29-4283/25:* Reason: wrong-form 2004-2-21-10:58:29-4283/26:* Reason: wrong-person() 2004-2-21-10:58:29-4283/27:* Reason: wrong-person(subject (cat)) 2004-2-21-10:58:29-4283/28:* Reason: different-sense 2004-2-21-10:58:29-4283/28:* Reason: didnt-translate-word() 2004-2-21-10:58:29-4283/28:* Reason: didnt-translate-word() 2004-2-21-10:58:29-4283/29:* Reason: wrong-gender() 2004-2-21-10:58:29-4283/29:* Reason: none-of-the-above(don't translate) 2004-2-21-10:58:29-4283/29:* Reason: wrong-gender() 2004-2-21-10:58:29-4283/30:* Reason: wrong-tense 2004-2-21-10:58:29-4283/31:* Reason: none-of-the-above(don't translate) 2004-2-21-10:58:29-4283/31:* Reason: different-sense 2004-2-23-08:02:57-7966/4:* Reason: different-sense 2004-2-23-08:02:57-7966/4:* Reason: wrong-gender() 2004-2-23-08:02:57-7966/4:* Reason: wrong-number() 2004-2-23-08:02:57-7966/5:* Reason: wrong-person() 2004-2-23-08:02:57-7966/5:* Reason: didnt-translate-word() 2004-2-23-08:02:57-7966/6:* Reason: wrong-form 2004-2-23-08:02:57-7966/7:* Reason: wrong-form 2004-2-23-08:02:57-7966/8:* Reason: wrong-form 2004-2-23-08:02:57-7966/9:* Reason: wrong-gender() 2004-2-23-08:02:57-7966/10:* Reason: wrong-form 2004-2-23-08:02:57-7966/10:* Reason: wrong-number(yo) 2004-2-23-08:02:57-7966/17:* Reason: didnt-translate-word() 2004-2-23-08:02:57-7966/18:* Reason: wrong-tense 2004-2-23-08:02:57-7966/18:* Reason: wrong-number(puentes) 2004-2-23-08:02:57-7966/21:* Reason: wrong-number(el) 2004-2-23-08:02:57-7966/21:* Reason: wrong-number(el) 2004-2-23-08:02:57-7966/21:* Reason: wrong-number(é) 2004-2-23-08:02:57-7966/21:* Reason: wrong-number(él) 2004-2-23-08:02:57-7966/23:* Reason: wrong-gender(chica) 2004-2-23-08:02:57-7966/23:* Reason: didnt-translate-word() 2004-2-23-08:02:57-7966/23:* Reason: different-sense 2004-2-23-08:02:57-7966/23:* Reason: wrong-form 2004-2-23-08:02:57-7966/24:* Reason: wrong-form 2004-2-23-08:02:57-7966/24:* Reason: wrong-form 2004-2-23-08:02:57-7966/25:* Reason: wrong-tense 2004-2-23-08:02:57-7966/26:* Reason: wrong-person(madre) 2004-2-23-08:02:57-7966/26:* Reason: wrong-form 2004-2-23-08:02:57-7966/27:* Reason: wrong-number(gato) 2004-2-23-08:02:57-7966/27:* Reason: incorrect-word() 2004-2-23-08:02:57-7966/28:* Reason: incorrect-word() 2004-2-23-08:02:57-7966/29:* Reason: wrong-gender(pluma) 2004-2-23-08:02:57-7966/29:* Reason: wrong-gender(pluma) 2004-2-23-08:02:57-7966/30:* Reason: incorrect-word() 2004-2-23-08:02:57-7966/31:* Reason: wrong-tense 2004-2-23-08:02:57-7966/31:* Reason: different-sense 2004-2-24-14:20:58-19501/4:* Reason: wrong-gender("chairs" (sillas, fem)) 2004-2-24-14:20:58-19501/4:* Reason: wrong-number("chairs" (sillas, pl.)) 2004-2-24-14:20:58-19501/5:* Reason: wrong-form 2004-2-24-14:20:58-19501/5:* Reason: wrong-person("you" (tu, 2nd)) 2004-2-24-14:20:58-19501/6:* Reason: wrong-form 2004-2-24-14:20:58-19501/7:* Reason: wrong-gender(context given) 2004-2-24-14:20:58-19501/7:* Reason: wrong-gender(context given) 2004-2-24-14:20:58-19501/7:* Reason: wrong-form 2004-2-24-14:20:58-19501/8:* Reason: wrong-form 2004-2-24-14:20:58-19501/9:* Reason: wrong-gender(context given) 2004-2-24-14:20:58-19501/10:* Reason: wrong-person("I" (yo, 1st)) 2004-2-24-14:20:58-19501/10:* Reason: wrong-number("I" (yo, sing.)) 2004-2-24-14:20:58-19501/10:* Reason: wrong-form 2004-2-24-14:20:58-19501/15:* Reason: incorrect-word() 2004-2-24-14:20:58-19501/17:* Reason: didnt-translate-word() 2004-2-24-14:20:58-19501/18:* Reason: incorrect-word() 2004-2-24-14:20:58-19501/18:* Reason: incorrect-word() 2004-2-24-14:20:58-19501/18:* Reason: wrong-number(bridges ("puentes")) 2004-2-24-14:20:58-19501/19:* Reason: wrong-form 2004-2-24-14:20:58-19501/19:* Reason: wrong-form 2004-2-24-14:20:58-19501/21:* Reason: wrong-number("he" (él)) 2004-2-24-14:20:58-19501/23:* Reason: didnt-translate-word() 2004-2-24-14:20:58-19501/23:* Reason: wrong-gender(girl ("chica", fem)) 2004-2-24-14:20:58-19501/23:* Reason: incorrect-word() 2004-2-24-14:20:58-19501/23:* Reason: wrong-form 2004-2-24-14:20:58-19501/24:* Reason: wrong-form 2004-2-24-14:20:58-19501/24:* Reason: wrong-form 2004-2-24-14:20:58-19501/25:* Reason: wrong-form 2004-2-24-14:20:58-19501/25:* Reason: wrong-tense 2004-2-24-14:20:58-19501/25:* Reason: wrong-form 2004-2-24-14:20:58-19501/26:* Reason: wrong-person(us ("nos")) 2004-2-24-14:20:58-19501/27:* Reason: incorrect-word() 2004-2-24-14:20:58-19501/27:* Reason: wrong-number(cat ("gato", sing)) 2004-2-24-14:20:58-19501/29:* Reason: wrong-gender(feather ("pluma", fem)) 2004-2-24-14:20:58-19501/29:* Reason: wrong-gender(feather ("pluma")) 2004-2-24-14:20:58-19501/30:* Reason: wrong-tense 2004-2-24-14:20:58-19501/30:* Reason: incorrect-word() 2004-2-24-14:20:58-19501/31:* Reason: incorrect-word() Once the CI class is working, need to expand it to extract clue word from complex LogFiles. Algorithm: Look into () after *Reason: - if it contains one word and this word corresponds to a word in the TLS, we have a candidate - if it contains multiple words and one of them corresponds to a word in the TLS, we have a likely candidate (indicate degree of uncertainty somehow) Note words can be in quotes or not! - either way, store the complete reason string so that I can always refer back to it (Reason="wrong-number(cat ("gato", sing))"). **************************************************************** If alignment info is not right, suggest that each word contains its alignment information as well. SLWord might contain two alignments one to TL and one to CTL, whereas TL and CTL words will only contain alignments to SLWords. - Updating file ManipulatingRules.txt with more detailed info for params required by each method - copied TrRule into Rule class, need to implement my own Rule class with only the methods I want. Sunday, February 26, 2006 !!! No need of perl script to pre-process the LogFiles anymore !!! - Bill sent his test code (TestCI.cpp) his main only calls the Load method, but he included a print method which exercises the different CI methods this time. ############################################################################# 1.Bug: the new word is added in an incorrect position Cause: when a word is added, the position count starts at 0 insead of 1, and all the other positions start at 1. Log files affected 4, 5 and 7 2.Bug: when reading in Log File 7, right after "Loading: 7", there seems to be a loop for the SL sentence vector that prints it as it traverses it. 1.Improvement: After editing a word, if there is an spurious move action (the words do not change), detect and discard action. Add a comment to the code that does this, so that it can easily be commented out later if the need arises. Example: last action of Log file 9 You seem to be doing this right already for add and delete actions which are followed by a spurious "word has been moved" statement. Task: Do big 0 analysis for the loadCI method. Issue: when a word is edited and then the word order affecting that word changes, my framework currently assumes that this is actually part of the same correction (the word needs to be moved becuase it has a different form). Unfortunately, the TCTool does not register which word was moved where, and it would not really matter, unless one of the two words involved in the order changes was edited immediatelly after or before the order change. This becomes more of a problem when the other word involved in the order change is actually also edited for a different reason further down in the log file, which is precisely the case for log file 8. (In log file 9 this is done correctly, but I'm guessing that is just by chance). However, if we used a greedy algorithm to detect such cases and pick the word recently edited (or about to be edited) as the word which has also been moved, when the move is local (namely happens between contiguous words), then my framework will actually work much more nicely. Otherwise, instead of just having one error word Wi for two related errors, which are actually affecting the same word, we would end up with two error words and would not be able to capture that the move is related the the edit. Since this affects looking at two different actions, this probably would need to be done in the code that deals with all the actions, namely my code. However, your CI interface does not allow me to reset the position of the error word in the C/TLS. Could you add this in? Or can you think of a better way to deal with this special case? Question: what does sz stand for in your code? For Next iteration: - make sure it also parses Log Files egenrated by the full-fledged version of the TCTool: - Add clue word info - Add confidence level info (*Desirable vs *Necessary -> need to check JavaScript implementation to see what other words are stored to indicate confidence level)-> need to debug first, when I try to edit the word, an error occurs!! - make sure the CI class is robust. For example, make sure the user can add multiple-word entries and that the CI class will store that correctly as the new word added. ############################################################################## - sent Bill an email (bill/info/email-7-CIFeedback) - copied Bill's files into V0.03 dir Tuesday, February 28, 2006 - 9am: met with Jaime 4 Bill: Once basic CI class is working for all Log Files... Here is the more researchy aspect of manipulating CIs: Need to detect and cancel (not store) empty/spurious loops. Example: For the following sequence of correction actions: sl: The great artist El artista gran El artista grande (edit: gran->grande) El grande artista (cwo) El gran artista (edit: grande->gran) Only the following should be stored in CI's vActions: El artista gran El gran artista (cwo) Note that these are not necessarily "state repeats" (AI), in this case for example, it doesn't go back to the initial state, since the order of artista and gran is changed in between spurious corrections. Another example, which seems too terrible to naturally occur, but which I have actually seen users do, goes as follows: For the following sequence of correction actions: sl: I saw the girl Vio la muchacha Vio a la muchacha (add a word: 'a') add alignment 'a' with 'saw' delete alignment from 'a' to 'saw' add alignment from 'a' to 'woman' delete alignment from 'a' to 'woman' Vio la muchacha (delete a word: 'a') Vi la muchacha (edit: vio->vi) Vi a la muchacha (add a word: 'a') add alignment 'a' with 'saw' delete alignment from 'a' to 'saw' add alignment from 'a' to 'the' delete alignment from 'a' to 'the' Only the following should be stored in CI's vActions: sl: I saw the girl Vio la muchacha Vi la muchacha (edit: vio->vi) Vi a la muchacha (add a word: 'a') Once this is in place, we will be in a situation where we can implement a comparison method, that given two different CIs for the same SL-TL-CTL tripplet, it detects if the non-spurious actions are the same, namely if two CI's are equivalent, even though their LogFile is not identical. If the order in which the correction actions took place is different, this should also be considered somewhat equivalent... (even though this might not actually be true for dependent errors... since taking into consideration one correction first might result into different refinements). Maybe instead of having the compare method return a boolean, it could return an int, where 0 means no equivalence, 1 means exact equivalence (without counting spurious loops) and 2 means equivalence in terms of correction actions used, but not in their sequence. - Once the comparison method is in place, implement a collection of CIs, indexed by SL-TL-CTL, each CI should contain a vector of unique IDs, initially only containing one ID (extracted from Log File or the directory name that generated it). Should be able to access all CI's that have the same SL-TL sequence (need at least two access methods, once given SL-TL-CTL and one given SL-TL). - For each SL-TL, group all CIs affecting a particular SL-TL-CTL tripplet together. When storing CI's affecting the same SL-TL-CTL, if it turns out that two correction instances are equivalent (see above), then we only want to store it once, say we decide to store CI1, and then add the unique ID from CI2 into the vectorID, so that it now contains the ID for CI1 and the ID for CI2. So if we do this recursively, at the end, only unique CIs should be stored under each different SL-TL-CTL tripplet, and each CI will store a vectorID with one or more unique IDs. - Once all this is in place, add an evidence method, which will return the size of the vector, given a CI. This tells us how many actual LogFiles support the evidence for that CI. - Another method that should be implemented is given a collection of grouped CIs, return the CI with more evidence (namely, larger size of vectorID). We could call this GetBestCI or something like that. - The next step is to store a new collection containing the CIs with higher evidence. After processing all the CIs, this collection will contain only one CI per SL-TL pair, namely the one with higher evidence (BestCI). - Finally, we will need a ranking method, which given this new collection, it ranks BestCIs by error complexity, from easier to refine to harder to refine. A first approxiamtion of error complexity could be the number of corrections, which sort of correlates to number of errors. This means that the method needs to compare each BestCI's Actions vector and the one with a smaller vector, should be ranked higher. There is one caveat here, the error complexity comparison method should not take into account Alignment corrections. This is not to say that alignment corrections are not relevant for furhter processing, but we don't think they should contribute the same as other actions to error complexity, and so the easiest way to do this, is by not counting them at all in this method. - But there is something else that can be done in terms of approximating a true error complexity, when there are multiple errors, namely to try to detect if the errors are independent or not. The reason for this being that when errors are dependent, it becomes trickier to decide which refinement operation to apply first, and thus such cases should be ranked lower by the ranking method, and thus should have a higher complexity score. Now, this is definitely a research topic per se and a very tricky thing to check, since to be really sure, one would want to track the error all the way down to the rules, and see if different corrections affect the same rules. However, on a superficial level, we can implement an reasonable approximation in the right direction. For example, if a CI contains multiple errors, all affecting different words, then we make the assumption that they are independent errors. If, on the other hand, two or more different corrections affect the same word, then we can make the assumption that they are dependent. For the following example: sl: Gaudi was a great artist tl: Gaudi es un artista grande Correction 1: edit: grande->gran temp_ctl: Gaudi es un artista gran Correction 2: cwo: grande artista ctl: Gaudi es un gran artista the new error complexity comprison method, would detect that there were 2 actions involved and that they were not independent. It is not immediately clear how to quantify dependencies, but let's say that for now, when dependency is detected between two or more actions, the error complexity gets incremented by 1. In this case, CIs with two dependent actions would get a score of 3 and would be considered as complex as CIs with three independent actions. This is not ideal, so if you can think of a better scoring mechanism, that would be grant. The ideal final ranking of CIs would be as follows: 1st: CIs with one correction action 2nd: CIs with two independent correction actions 3rd: CIs with two dependent correction actions 4th: CIs with three independent correction actions 5th: CIs with three dependent correction actions etc... So maybe storing the information about error dependency separately from the actual number of corrections is what is required here. - And as we were talking about two weeks ago, CIs should also have a couple of book-keeping variables to store: 1. whether they lead to an actual change in the system, and 2. whether that refinement(s) increased or decreased the accuracy over a regression test set. - Another thing we are going to want to store later on in the refinement process is what rules and lexical entries would be affected by the refinement operations triggered by a specific CI action. So even though the initial CIs will have no information about that, I need to have a way to store Rule ID for each CI's action (storeRuleID). Note that the grammar rule and lexical entry IDs are of the same format, so just one data structure is needed. This will allow us to calculate real dependencies later in the process, namely I will be able to tell if two different CIs will end up affecting the same rules and lexical entries or not. So having a method to tell us whether two different RuleID vectors contain the same elements (regardless of order), is what we need here. I'm envisioning something like this: vector DetectSameRuleID (CI1, CI2) where it loops over each Actions vector and if it detects the same RuleID, in one of the actions in the other CI, it stores it in the out argument, which is then returned at the end, so that we can tell which Rules are affected why both CIs. 4 me: In batch mode, and when more than one user have corrected the same sentences (user study), group all CI affecting one particular SL-TL pair together - Test Collection of CIs, access them by SL-TL and then by SL-TL-CTL - check how many log files supported any given CI. - exercise ranking methods and other methods I asked Bill to implement -> need to generate logfiles with multiple independent errors and multiple dependent errors, etc. 2 indep 2 depen 3 indep 3 depen Now as for the time sequence processing problem, Jaime suggested that each CI also stores what rules and lexical entries are affected by the refinement(s) triggered by that CI. Ask Bill to implement those methods for each CI *ACTION* (different actions affect different rules!) and test: StoreRuleID(CI), vector DetectSameRuleID (CI1, CI2). So that in the future, we can calculate rule dependencies in the following way: Epsilon? (change_1 ->(followed by) change_2) =? Epsilon? (change_2 -> change_1) given rule r, if we apply change_1 first then we get r1' and then change_2, we get r1'' if we apply change_2 first then we get r2' and then change_1, we get r2'' the question is if r1'' and r2'' are equivalent. This actually is more complex, since we need to take into account lexical entries and possibly more than one rule as well. -> interesting research problem: are two grammars with their respective lexical entries equivalent? -> Alon If r1'' and r2'' are not equivalent, then we can do: - regression testing (batch mode) - active learning (interactive system) - backed up RuleRefinement dir to Avenue afs and temuco (/usr4) Friday, March 3, 2006 - merging to-do's for Bill -> CorrectionInstanceClassExpansion.doc - met with Bill and went over the ideas behind CorrectionInstanceClassExpansion (he was sick of dealing with LogFiles, reasons (clue words) are nasty! We'll meet Tuesdays or Thursdays from now on - worked on short paper for hlt-naacl PhD consortium Tuesday, March 7, 2006 - Moved on to next version: mkdir V0.04 [aria@avenue RuleRefinement]$ cp -r V0.03 V0.04 - incorporating Bill's CI methods to my code (RuleRefinement.cpp) - copied Bill's version of ParseTree which he expanded. - met with Bill - instead of having to have CI as a data member, Bill suggested that I could also implement a mehtod in the Refiner class that returns a constant pointer to the right CI: const CI* Refiner::UseCI(); --- pRefiner->UseCI(); Wednesday, March 8, 2006 - met with Jaime (see research-diary.txt) - emailed Bill to include correct CI information so that RR module can access that info and use to decide REFINE vs BIFURCATE as well as for regression testing Thursday, March 9, 2006 - looking at rules and lexical entries - constraints!!! Friday, March 10, 2006 (Mariona = 30) - finished writting up rule and lexical entries document (GrammarRulesLexicalEntries.txt) and sent Bill an email in case he gets it before going to Acapulco for Spring break (email-11-RulesLexicalEntries). - storing further comments to send to Bill in /usr0/aria/RuleRefinement/info/4Bill, so that I can send them all in one single email at the end of next week Sunday, March 19, 2006 - continue integrating bill's CI code: debugging InstantiateLogInfo [aria@avenue V0.04]$ make Refiner /usr/local/bin/g++ -g -ftemplate-depth-24 -Wno-non-virtual-dtor -DNDEBUG -o Refiner.o -c Refiner.cpp -I/temuco/usr5/shared/code/antlr/antlr-2.7.1/lib/cpp -I/afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/stable-linux2 -I/shared/Genkit/UKernel -I/shared/Genkit/Toolbox Refiner.cpp: In member function `void Refiner::InstantiateLogInfo(CorrectionInstance*)': Refiner.cpp:132: duplicate case value Refiner.cpp:124: previously used here make: *** [Refiner.o] Error 1 l. 124: case CHANGE_WORD_ORDER: { ActionChangeWordOrder *pChangeOrder = (ActionChangeWordOrder*)pAct; cout << "Action : Change word order, move : " << pChangeOrder->GetOldPos() << " to : " << pChangeOrder->GetNewPos() << endl; break; } l. 132: case ADD_ALIGNMENT: { ActionAddAlignment *pAddAlignment = (ActionAddAlignment*)pAct; cout << "Action : Add alignment : " << pAddAlignment->GetNewAlign() << " to word at : " << pAddAlignment->GetSLWordPos() << endl; break; } But the TestCI code seems to have compiled fine... - testing Bill's CICollection code: expanded testcode to print from Collection accessing methods first 2 create object file /usr/local/bin/g++ -o TestCICollection.o -c TestCICollection.cpp 2 create executable, need to give all the source (object) files!!! /usr/local/bin/g++ -o TestCI TestCICollection.o CICollection.o CorrectionInstance.o ParseTree.o RefCountedObject.o 2 run the executable: ./TestCI - emailed him my feedback and comments Re: Example/Test code March 20, 2006 - debugging ErrorComplexity scores (see LogFiles and complexity criteria I gave bill, and make sure it's right) running /usr0/aria/RuleRefinement/V0.04/CICollection/TestCI found some bugs, emailed Bill - copy the new CorrectionInstance implementation from CICollection to V0.04 (plus other related files: RefCountedObject.cpp, etc.) [aria@avenue V0.04]$ mv CorrectionInstance.hpp CorrectionInstance.hpp.bak [aria@avenue V0.04]$ mv CorrectionInstance.cpp CorrectionInstance.cpp.bak [aria@avenue V0.04]$ cp CICollection/CorrectionInstance.hpp . [aria@avenue V0.04]$ cp CICollection/CorrectionInstance.cpp . [aria@avenue V0.04]$ cp CICollection/*.cpp . cp: overwrite `./CorrectionInstance.cpp'? n cp: overwrite `./ParseTree.cpp'? y [aria@avenue V0.04]$ cp CICollection/*.hpp . cp: overwrite `./CorrectionInstance.hpp'? n cp: overwrite `./Lexicon.hpp'? y cp: overwrite `./ParseTree.hpp'? y cp: overwrite `./Tokenizer.hpp'? y - can now get ParseTree before calling LoadFromFile (GetSLandTLfromTCToolLogFile) Since Refiner is not compiling correctly since I added the code to get the different action types (from TestCICollection), I commented it out. Compiler errors are: Refiner::InstantiateLogInfo(CorrectionInstance*)': Refiner.cpp:132: duplicate case value Refiner.cpp:124: previously used here since I was getting compiler errors for the Refiner (getActions() bit of it, from TestCI), I commented it out, but now I am getting linker errors :( [aria@avenue V0.04]$ make RR /usr/local/bin/g++ -g -ftemplate-depth-24 -Wno-non-virtual-dtor -DNDEBUG -o RuleRefinement.o -c RuleRefinement.cpp -I/temuco/usr5/shared/code/antlr/antlr-2.7.1/lib/cpp -I/afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/stable-linux2 -I/shared/Genkit/UKernel -I/shared/Genkit/Toolbox In file included from CorrectionInstance.hpp:10, from RuleRefinement.cpp:16: RefCountedObject.hpp:17:7: warning: no newline at end of file /usr/local/bin/g++ -g -ftemplate-depth-24 -Wno-non-virtual-dtor -DNDEBUG -o RuleRefinement.exe RuleRefinement.o Param.o ParamDef.o CorrectionInstance.o Refiner.o Lexicon.o Grammar.o ParseTree.o Constraint.o Utils.o TrRule.o StringUtils.o GlobalParams.o LineUp.o /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/stable-linux2/TransferGrammarLexer.o /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/stable-linux2/TransferGrammarParser.o /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/stable-linux2/FStructLexer.o /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/stable-linux2/FStructParser.o /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/stable-linux2/transfer.o /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/stable-linux2/transfer-support.o /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/stable-linux2/chinese.o /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/stable-linux2/english.o /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/stable-linux2/UnicodeTools.o /shared/Genkit/Toolbox/*.o /shared/Genkit/UKernel/*.o -L/temuco/shared/code/antlr-2.7.5/lib/cpp/src -lantlr /usr/bin/ld: Dwarf Error: Invalid or unhandled FORM value: 14. /usr/bin/ld: Dwarf Error: Invalid or unhandled FORM value: 14. ... /usr/bin/ld: Dwarf Error: Could not find abbrev number 343. RuleRefinement.o: In function `main': RuleRefinement.o(.text+0x4a9): undefined reference to `Refiner::Refiner[in-charge]()' ... /usr/bin/ld: Dwarf Error: Invalid or unhandled FORM value: 14. /usr/bin/ld: Dwarf Error: Invalid or unhandled FORM value: 14. /usr/bin/ld: Dwarf Error: Could not find abbrev number 651. CorrectionInstance.o: In function `CorrectionInstance::CorrectionInstance[not-in-charge]()': CorrectionInstance.o(.text+0x1c78): undefined reference to `RefCountedObject::RefCountedObject[not-in-charge]()' CorrectionInstance.o: In function `CorrectionInstance::CorrectionInstance[in-charge]()': CorrectionInstance.o(.text+0x1f06): undefined reference to `RefCountedObject::RefCountedObject[not-in-charge]()' collect2: ld returned 1 exit status -> emailed Bill (again...) - I added RefCountedObject to the Makefile and it improved, but I am still getting an error related to the Refiner... usr/bin/ld: Dwarf Error: Invalid or unhandled FORM value: 14. /usr/bin/ld: Dwarf Error: Could not find abbrev number 343. RuleRefinement.o: In function `main': RuleRefinement.o(.text+0x4a9): undefined reference to `Refiner::Refiner[in-charge]()' collect2: ld returned 1 exit status make: *** [RR] Error 1 March 21, 2006 - generated Log Files with spurious loops (2006-3-21-14-05-04-3917) and sent description to Bill (email-14-LogFilesWithSuriousLoops) - added XferEngine class, which is the interface between Erikc's TransferEngine class and my code (moved CTLSinXferLattice code from Refiner to XferEngine). it compiles but I get a linker error, even though the Makefile looks fine... ... /usr/bin/ld: Dwarf Error: Could not find abbrev number 343. RuleRefinement.o: In function `main': RuleRefinement.o(.text+0x4a9): undefined reference to `XferEngine::XferEngine[in-charge]()' collect2: ld returned 1 exit status - I had a constructor in hpp but it wasn't implemented in cpp. So I added it, but then i got: XferEngine.cpp: In constructor `XferEngine::XferEngine()': XferEngine.cpp:16: uninitialized reference member `XferEngine::m_SLSentence' make: *** [XferEngine.o] Error 1 So eventually removed the m_SLSentence from private: in hpp, and it linked! ./RuleRefinement.exe aria@avenue V0.04]$ ./RuleRefinement.exe Compiled on Mar 22 2006 12:42:29 with g++ version: 3.2.3 in debug mode Parameters are: Debug Level = 1 Lexicon File = /usr0/aria/eng2spa/lexicons/simulation-lexicon.trf Grammar File = /usr0/aria/eng2spa/grammars/simulation-grammar.trf TCTool Log File = /usr0/aria/RuleRefinement/IOFiles/SampleTCToolLogFile-processed The SL and TL for the LogFile given as a param are: VEO -- EL initfile is /usr0/aria/eng2spa/auto-init.txt Turning on Latin-1 mode Setting normalizecase to UPPER-CASE Setting find all translations to ON Setting output source text to ON Setting showtrace to full trace with src indices Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar.trf with 15 rules added Loading lexicon file /usr0/aria/eng2spa/lexicons/simulation-lexicon.trf with 33 lexical entries added No parse found. LEX 0 GRA 0 UNK 0 MORPH 0 COMP 0 No full parse was found! Partial parse is: VEO tree: <(V,0:1 'VEO')> Deleting all loaded rules. Printing Tree extracted from Log File ( )Instantiating correction instance from TCTool Log File... ************************************************** 1. Checking against the existing lattice... ************************************************** Segmentation fault -> this is what happens when I don't give it a LogFile to process.. the default LogFile is /usr0/aria/RuleRefinement/IOFiles/SampleTCToolLogFile-processed changed to "9", which is a real LogFile March 22, 2006 - To give it the LogFile param explicitely use the following flag: -a ./RuleRefinement.exe -a 9 Parameters are: Debug Level = 1 Lexicon File = /usr0/aria/eng2spa/lexicons/simulation-lexicon.trf Grammar File = /usr0/aria/eng2spa/grammars/simulation-grammar.trf TCTool Log File = 9 but still getting the seg fault - Got rid of code that depended on example numbers. - pCI->GetSLandTLFromTCToolLogFile(pLogFile, SL, TL) is working correctly, but pXfer->ExtractParseTreeFromXferLattice(SL, TL, pGramFileName, pLexFileName); isn't, need to debug! March 24, 2006 - TL was in small caps, while Xfer engine Tr alternatives were in AllCaps, added String2AllCaps method from StringUtils in XferEngine method. - LTI OH: potential research collaboratiors: Vasco and Paul (pto) March, 27, 2006 - reimplementation og XferEngine (wrapper to Erik's class): splitting methods for XferEngine into basic, logical ones StartXfer(); // loads the default init file, grammar and lexicon StartXfer(char* pInitFileName, char* pGramFileName, char* pLexFileName); StartXfer(char* pGramFileName, char* pLexFileName); EndXfer(); LoadGra(char* pGramFileName); LoadLex(char* pLexFileName); Translate(string SL); And only one data member: TransferEngine* xfer; Merged CTLSinXferLattice(char *pRefinedGramFileName, char *pRefinedLexFileName) and CTLSinRefinedLattice(char *pRefinedGramFileName, char *pRefinedLexFileName) into just one method: TLInLattice(string SL, string TL); - debugged it and test it, seems to be working ok :-) - backed it up March 28, 2006 - looking into different Rule implementations to prepare for my meeting with Bill - Make sure all the data members in Rule.hpp are also in TrRule: TimeStamp RuleHistory TranslationPairs (to do regression testing) In the Grammar and Lexicon classes, need to add POS_Counters! NPcount; // stores num of rules in G for NP rules VPcount; // stores num of rules in G for VP rules PPcount; // stores num of rules in G for PP rules Scount; // stores num of rules in G for S rules ... Vcount; // stores num of rules in L for V entries Ncount; // stores num of rules in L for N entries ... and the method that allows to obtain next available ID for a specific POS: int GetNextAvailableRuleID(POS) { POScount++; // effectively increase the counter return POScount; } - met with Bill: still working on CICollection (detecting spurious loops) and started looking at how to implement the Rule class. We'll take K's class and expand it and reimplement some of the methods, especially the ones dealing with constraint manipulation. Methods returning vectors are bad!!! he's thinking about how to modify K's code - Asked Bill about whether instead of the method he proposed to have the CI class accessible from the Refiner (const CI* Refiner::UseCI()), I could just have a pointer to CI as a private data member, and then have a method with copies a specific pCI to the data member pointer (= XferEngine) -> yep - modifying Refiner class... keep debugging and testing it March 29, 2006 - CI doesn't seem to have AddRef() Release() methods, as Bill implied, so I found some in CICollection, but there was a bug in the implementation. Bill: These methods are for a vector of CIs, which AddRefs and Release all CIs in the vector (not release all references in a single CI). These type is used by some methods in CI collection. Check out RefCountedObject.cpp for the implementation of AddRef and Release. -> CI inherits these methods from RefCountedObject, working now - /usr/bin/ld: Dwarf Error: Invalid or unhandled FORM value: 14. /usr/bin/ld: Dwarf Error: Invalid or unhandled FORM value: 14. /usr/bin/ld: Dwarf Error: Could not find abbrev number 343. RuleRefinement.o: In function `main': RuleRefinement.o(.text+0xbb9): undefined reference to `Refiner::Refiner[in-charge]()' collect2: ld returned 1 exit status -> missing Refiner() constructor implementation in Refiner.cpp - Working fine now, Refiner is testing the integration of Bill's CI code :) April 11, 2006 - testing Bill's code, made changes to Test code, to compile and get a new executable: 2 create object file /usr/local/bin/g++ -o TestCICollection.o -c TestCICollection.cpp /usr/local/bin/g++ -o TestCICollection TestCICollection.o CICollection.o CorrectionInstance.o ParseTree.o RefCountedObject.o DirectoryTraverser.o ./TestCICollection > TestCICollection.out April 13, 2006 - adding code to TestCICollection, to make sure I can access all the information i need for the RRefiner (Wi, Wi', Wc and tempCTLS). Wc are only present for edit actions, and so should be moved there (email Bill) moved ApplyToWords to after the different cases of action, so that I can get the right info... ./TestCICollection > TestCICollection.out1 April 23, 2006 - finished second pass on RR Algorithm and sent to Bill April 24, 2006 - Bill stopped by: he's not going to implement all the test code for now, but rather he'll write debug code tomorrow when we meet, since there is not enough time right now Comments: he'll only worry about time stamp and History information, and hopefully TranslationPair info, and not about the general comments. Went over rule history info that needs to be stored again to make sure that he's allowing for an original rule to be active or inactive (encode as a comment or in a different file) - Implemented a spurious corrections detector /usr0/aria/RuleRefinement/bin/PreProcessLogFiles.pl perl script to detect false corrections ((fix sentence)+ + (next sentence)* in the same file), it should be called from the CI class. 1, 6 problematic log files: if multiple submit values, only store the last one's 2006-2-13-17-13-55-8336/1 counter = 1 tl2-al = ((1,1),(2,2)) submit = FIX TRANSLATION senum = 9 time = Mon Feb 13 17:14:21 2006 sl = she read ID = 2006-2-13-17-13-55-8336 tl1 = ella leyÓ tl1-al = ((1,1),(2,2)) con = counter = 1 tl2-al = ((1,1),(2,2)) submit = NEXT SENTENCE senum = 9 time = Mon Feb 13 17:14:21 2006 sl = she read ID = 2006-2-13-17-13-55-8336 tl1 = ella leyÓ tl1-al = ((1,1),(2,2)) con = 2006-2-13-16-51-29-8333/6 counter = 6 tl2-al = ((1,1),(2,2),(3,3)) submit = NEXT SENTENCE tl3-al = ((1,1),(2,2),(3,3)) senum = 9 time = Mon Feb 13 16:56:04 2006 sl = they see water ID = 2006-2-13-16-51-29-8333 tl1-al = ((1,1),(2,2),(3,3)) tl3 = ellos ven agua con = counter = 6 tl2-al = ((1,1),(2,2),(3,3)) submit = NEXT SENTENCE tl3-al = ((1,1),(2,2),(3,3)) senum = 9 time = Mon Feb 13 16:56:04 2006 sl = they see water ID = 2006-2-13-16-51-29-8333 tl1-al = ((1,1),(2,2),(3,3)) tl3 = ellos ven agua con = counter = 6 tl2-al = ((1,1),(2,2),(3,3)) submit = NEXT SENTENCE tl3-al = ((1,1),(2,2),(3,3)) senum = 9 time = Mon Feb 13 16:56:04 2006 sl = they see water ID = 2006-2-13-16-51-29-8333 tl1-al = ((1,1),(2,2),(3,3)) tl3 = ellos ven agua con = ok: 2006-2-13-16-51-29-8333/1 2006-2-13-17-13-55-8336/6 - looking at how to call a perl script from C++ code: Stephan's /usr0/aria/RuleRefinement/bin/TextFilter - added a system call to the perl script from my C++ code: 1. copied /usr0/aria/RuleRefinement/bin/TextFilter/pstream.h to my working dir (V0.04) 2. Included it at the top of RuleRefinement.cpp: #include "pstream.h" // for doing system calls (perl script, etc.) using redi::pstream; 3. Added this line of code in main: redi::ipstream System( "/usr0/aria/RuleRefinement/bin/PreProcessLogFiles.pl /usr0/aria/RuleRefinement/IOFiles/2006-2-13-16-51-29-8333/6" ); compiled and run it and it works!!! Unfortunately I don't know how to pass the FileName as a parameter to the system call other than literaly specifying the path and file name, so the following doesn't work: /* string FileName = "/usr0/aria/RuleRefinement/IOFiles/2006-2-13-17-13-55-8336/1"; redi::ipstream System1( "/usr0/aria/RuleRefinement/bin/PreProcessLogFiles.pl FileName" ); */ Tuesday, April 25, 2006 - Test RR module with 1: I saw the red car /usr0/aria/RuleRefinement/IOFiles/2006-3-31-17-25-46-23762/0 4: Mary and John fell /usr0/aria/RuleRefinement/IOFiles/2006-3-31-17-25-46-23762/4 To test it in V0.04, run: ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-3-31-17-25-46-23762/0 ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-3-31-17-25-46-23762/4 ******************* ** moving to V0.05 ******************* - + Bill substituting old files with new files from Bill - not using log files with clue words for now ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-2-13-17-13-55-8336/0 ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-3-31-17-25-46-23762/4 - found a bug in ParseTree.Find(Word) - found a bug in extracting alignments from tree - bill is writting GetAffectedRules, so far only AddAffectedRules it's called for each Action Wednesday, April 26, 2006 - 10am meeting with Bill (10:30) - the following files are parsed correctly 2006-2-13-17-13-55-8336/ 0,1,2,3,4,5,6,9 (but crashes after printing out action type...) crash: 7,8, (multiple errors!!!) #################### edit: 0,2,3,8,9 add: 4, 5, delete: 7 cwo: 8,9 #################### bugs: fixed 1. Action methods were not defined and implemented as cons, but test code was using it as const fixed 2. words extracted from parse tree (affected rules) had quoates -> got rid of them fixed 3. overload == operator implementation incorrect (affected rules) fixed 4. Affected rules for edit retrieved the lexical rule and not the grammar rule (need to discard immediate parents for this) fixed 5. implementing a function to retrieve alignments for both TL and CTL words Pending: 6. Multiple word lex entries are copied in the grammar file instead of the lexicon file 7. missing attribute features for {VP,46} - backed up all the updated files both in temuco (/usr4) and afs - Currently assigning IDs to all rules and lex entries, printing out file and loading the simulation-G/L-ID.trf file to Xfer enginge instead. Ideally, this will be done offline and the final G and L will be loaded in RR. - pointers cannot change what they point to, if the pointee is a string object, since if I modified the size of the string, the string object wouldn't know and it would crash, since the memory required would differ. In order to change the value of a pointer to a string, i need to make the pointer const: const char * pToString = StringItPointsAt; changed the StartXfer parameters to const both in XferEngine.hpp and cpp -> need to do the same for the other methods - testing (and debugging) lexical entry query/extraction Thursday, April 27, 2006 - working on AMTA paper (official deadline May 1, can upload final version until May 4 8am) - 5-? bill bill working on improving lexicon class to support required accessor methods lexicon is fixed rules seemed to be working 1st example is working!!!!!!!! need to test rules more extensively never refine before bifurcating, it will screw up the whole collection indexing priority to do list 4 Bill: Here are the high-level methods that are still required for the two examples to work: - pNewLexEntry *Collection::CreateNewLex(POS, vsSLside, vsTLside) creates a new LexEntry from scratch, given a POS, vsTLside and vsSLside. later we'll need to add a default set of value constraints, depending on what POS is given. It needs to get the next available ID from the POScounter. - NewGrRule->AddAgrConstraint(iVar4Wi, iVar4Wi', sTriggeringFeature, EquationType) Given a NewGrammarRule (result from a bifrucation), adds an agreement constraint between positions Var4Wi, Var4Wi': ((Y_Var4Wi TriggringFeature) EquationType (Y_Var4Wi' TriggeringFeature)) (ex: ((y3 gender) = (y2 gender))) so it needs to create two strings for the two index variables (ex: 3-> y3 and 2->y2). - For this to work, there needs to be some sort of method, which I call RuleVariableInstantiation, that given two positions in TLWords, it extracts their POS (the ParseTree is already stored in the CI, correct?) and it finds their position (from 1 to n) in the rule. GrammarRule->GetIndexVariables(iWiPos, iWCluePos, &iInRuleWiPos, &iInRuleWCluePos) So for rule: {NP,8} NP::NP : [DET ADJ N] -> [DET N ADJ] ;; 1 2 3 ( (X1::Y1) (X2::Y3) (X3::Y2) ((x0 det) = x1) ((x0 mod) = x2) (x0 = x3) (y0 = x0) (y1 == (y0 det)) (y3 == (y0 mod)) (y2 = y0) ) if TL[veo el auto rojo] (SLWords is [I saw the red car]) GrammarRule->GetIndexVariables(4, 3, &iInRuleWiPos, &iInRuleWCluePos) iInRuleWiPos would now be 3 iInRuleWCluePos would now be 2 since 4(rojo) has POS ADJ and 3(auto) has POS N, and thus the method would return 3 and 2 (which will allow us to add an agreement constraint bewteen y3 and y2) - void Delta(pLexEntry1, pLexEntry2, &vsTriggeringFeatures) Given two lexical entries, it returns a vector of strings with one ore more striggering features. The way it does this is by comparing the two lexical entries at the feature level. Namely, it first checks if their POS are the same (if not, push the string "POS" to the TriggeringFeature vector and return), if it is, for each value constraint with the same feature attribute name in both lexical entries, if the value is different, push the feature name into vsTriggeringFeatures. So for example, given two pointers two the following LexEntries: ADJ::ADJ |: [red] -> ["roja"] ( (X1::Y1) ((x0 form) = red) ((y0 agr num) = sg) ((y0 agr gen) = fem) ) {ADJ,2} ADJ::ADJ |: [red] -> ["rojo"] ( (X1::Y1) ((x0 form) = red) ((y0 agr num) = sg) ((y0 agr gen) = masc) Delta would return vsTriggeringFeatures with one element: "agr gen". - PostulateNewFeature(sNewFeatureName) As I told you before, the grammar and the lexicon (RRRuleCollection) need to keep track of the last feature name ID (int) used by the grammar (possibly as a general comment at the beginning of the grammar and lexicon files). This will start being 1, and every time PostulateNewFeature() is called it will increase it (both in G and L) and then will return a string containing that integer. For example, for FeatCounter = 11, this method will return "feat_11". And I think this is it, to get the two examples working end-to-end! For the other examples that I have in mind, there are a couple more methods that will be required, and I describe them to you in case you have time: - Add a Value constraint (both for GrRules and LexEntries) NewRule->AddValConstraint(iVar4Wi, sTriggeringFeature, EquationType, Value) I anticipate the value always being either + or -, so it could be int instead of strings. - NewGrammarRule->AddConstituentToRHS(sPOS_or_TLWord, iPositionToBeAdded) adds the POS or Word given as a parameter into the RHS (TLside) at iPositionToBeAdded. Note, it also needs to update alignments and index values - NewGrammarRule->MoveConstituentRHS(sPOS1, iFinalPositionInRHS) given the POS to be moved, and the FinalPosition in the RHS (starting at 1) where it needs to be moved to, it changes the RHS (TLside) accordingly and it updates alignment and index values - Implement deactivating a rule by printing it out as a comment - search the G to see if any rule has the POS for Wi' on the RHS (anywhere) Actually, I need to query the grammar to see if POS_Wi is somewhere in a Rule RHS, but also I need to take into account the left and right context of Wi' word, namely the POS of the words to the right and to the left of it (AffectedRules?). But I am not sure how to do this... any ideas? - Finish spurious loop detection - Incorporate system call to pre-process CIs into his code (filter out spurious corrections) -############################################################ Monday May 1, 2006 (dia del treballador a tot el mon excepte els EUA :-() - Adding new lexical entry to L and reload to Xfer enfine, see final translation [aria@avenue V0.05]$ ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-3-31-17-25-46-23762/4 >! 4RR.out - added ReLoadLex and Gra methods to XferEngine. Testing reloadgra: Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon-ID-REFINED.trf with 30 rules added it seems that it adds all the rules again... check with Erik later. I can always do as I was doing before, and create a file with the changes that needs to be reloaded and only after it has confirmed to increase translation quality, I can add it to the Lexicon and save it all together in the same file... - Bill noon-4pm bill: enable and disable rule in G and L is working -> need to test implemented everything I asked him to expect for search in G (for a particular POS sequence) and spurious loop... -> need to test! ari: added CheckRefinedLattice method to XferEngine and example 4 is working end to end!! :-)))))))))))))))) started working on other example (0): agreement constraint ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-3-31-17-25-46-23762/0 > ! 0RR.out - delta function is working for 0!!! Tuesday, May 2, 2006 - AMTA paper - Bill (5-10pm) implemented RuleVariableInstantiation (works for base case, doesn't work for nodes that are embedded in different rules (missing recursion -> bill is working on it) Create new lexical entry (done) - testing delta function on other examples. It works for the following examples: 0, 3, for 2 to work, I need to be able to create a new LexEntry from scratch When trying to run example sentences 8 and 9 I get a seg fault: MAIN::Instantiating correction instance from TCTool Log File Segmentation fault -> need to look into this after deadline Created a constraint and set all the different parts + added to rule (it's working!!!) -> problems loading G and L (see below) Wednesday, May 3, 2006 - AMTA paper - Jaime: keep track of refinement status: proposed, confirmed1 (by exact match), confirmed2 (by increasing automatic MT metrics over a regression test) automatic eval metrics: *modified* BLUE (BLUE cannot be calculated just for a single sentence), NIST, METEOR and Jaime also suggested TER (error rate) and HER (GALE) -> which I might end up having to use depending on automatic results. - backed up V0.05 on Avenue and temuco:/usr4/ - submitted AMTA paper (3am) Friday, May 5, 2006 ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-3-31-17-25-46-23762/0 > ! 0RR.out [aria@avenue lexicons]$ grep "|:" simulation-lexicon-ID.trf | wc -l 33 [aria@avenue lexicons]$ grep "|:" simulation-lexicon-ID-REFINED.trf | wc -l 34 RefL should have 34 entries but the xfer engine only loads 25... create an init file and try it manually to see if it's chocking somewhere from 0RR.out: Loading lexicon file /usr0/aria/eng2spa/lexicons/simulation-lexicon-ID.trf with 33 lexical entries added Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon-ID-REFINED.trf with 25 rules added And with the grammar: [aria@avenue grammars]$ grep ">" simulation-grammar-ID.trf | wc -l 15 [aria@avenue grammars]$ grep ">" simulation-grammar-ID-REFINED.trf | wc -l 16 from 0RR.out: Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-ID.trf with 15 rules added Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-ID-REFINED.trf with 1 rules added However whenever I load them directly to the Xfer engine, it parses them w/o a problem: [aria@avenue eng2spa]$ transfer -if init-simulation.txt Turning on Latin-1 mode Setting normalizecase to UPPER-CASE Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-ID-REFINED.trf with 15 rules added Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon-ID-REFINED.trf with 34 rules added Setting find all translations to ON Setting output source text to ON Setting showtrace to full trace with src indices Translating file: /usr0/aria/eng2spa/corpus/error-typology-simulation 1: Translating "I see the red car". 2: Translating "she read". 3: Translating "I see the red unicorn". 4: Translating "Mary plays the guitar". 5: Translating "John and Mary fell". 6: Translating "you saw the woman". 7: Translating "they see water". 8: Translating "I would like to go". 9: Translating "I saw you". 10: Translating "Gaudi was a great artist". TOT 0 LEX 0 GRA 0 UNK 0 MORPH 0 COMP 0 COV nan UNKNOWNS LIKE 2 Only difference is how they are called, different methods in XferEngine... pXfer->StartXfer(pGramFileNameID, pLexFileNameID); pXfer->ReLoadLex(pRefinedLexFileName); I think I know what is going on! the reload command only adds the different entries, for the grammar this is 1, which is ok, and for the lexicon it's 25 even though it should be 1... [aria@avenue lexicons]$ less simulation-lexicon-ID-REFINED.trf .trf [aria@avenue lexicons]$ diff simulation-lexicon-ID.trf simulation-lexicon-ID-REFINED.trf 79a80,84 > {N,9} > N::N |: ["unicorn"] -> ["unicornio"] > ( > (X1::Y1) > ) maybe the Xfer engine reload method is sensitive to order or something else I am not taking into account... ask Erik! Monday, May 8, 2006 Erik's answer: You need to use loadgra to load the lexicon also, and not the lexicon specific commands like loadlex. ************************************** weird things going on with the refined G and L, - RefL should have 34 entries but the xfer engine only loads 25... create an init file and try it manually to see if it's chocking somewhere Loading lexicon file /usr0/aria/eng2spa/lexicons/simulation-lexicon-ID.trf with 33 lexical entries added Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon-ID-REFINED.trf with 25 rules added - the RefG file seems to constain the lex, even though I have already debugged it extensively... double check and keep debugging... ************************************** - ok, now it's just reloading 1 rule, however, it doesn't seem to be resticting the gender of the adjective... -> still need to debug Tuesday, May 9, 2006 - Bill: 1:30-3:50pm Bill says that they crash because they have a spurious correction... but that's not true... crash: 7,8, (multiple errors!!!) MAIN::Instantiating correction instance from TCTool Log File Segmentation fault I asked him to take a look at the two log files that are not correctly parsed. bug is in the way he "mirrors" the alignments assumption: i apply action to words, and then I extract the alignment info I expect to see after the previous action has taken effect copied 5-9-06/CorrectionInstance.cpp over testing it: ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-3-31-17-25-46-23762/0 > ! 0RR.out 7, 8 and 9 still seg fault Bill says it's a bug in the C++ iostream library, he asks the code to be in one position, and the code places him in a different position... hard to fix, can't do it on the spot. He'll do it as HW and we'll meet again Thursday 1pm - implement adding a lexical entry from scratch (I had already added the code in edit to do that!!): -> Bill implemented tree.GetPOSofWord(Wi) POS_TYPE POS = tree.GetPOSofWord(TLWord); pNewLexEntry->RRlexiconRule(POS, RC.USeRuleIDManager(), svSLside, vsTLside); But log file 2 deals with edit word, so need to debug edit case code: line 700: there must be a bug in the get alignment code, since log file 2 is not outputing this... since I hadn't applied any action Calignments does not contain anything, I needed to look into alignments! now it's working logfile 2 is working correctly as well!! Thursday, May 11, 2006 - worked on perl script to create inflected lexicon from MM vocabulary /usr0/aria/bin/WordList2Lexicon.pl shall put -> pondrán V |: ["put" ] -> ["pondrán"] ((x0 form) = shall) ) - POS for TL side [done] - get whole SL side [done] - accented characters? - add features -> need to debug getFS -> emailed Erik RuleRefiner: - implementing add a new constituent to the RHS of the rule to test bill's code figured out there are a couple of methods missing, emailed list to Bill Bill: 1:40-6pm - fix iostream library bug, so that 7, 8 and 9 parse it was actually a problem converting his code from windows to unix, he was reading files in binary mode and so was returning positions in a binary file instead of text but when I tried running RR on 7 and seg faulted again :( fixed this bug, now 8 and 9 are parsing file and there is a problem with 7. Namely, parse tree is empty, there is no alternative translation that matches with TL sentence!!! Reason: I modified the lexicon!!! -> need to create a new CI for 7 with current lexicon (TLS=me gustaria que ir) TL sentence is [WOULD LIKE QUE IR] these are the alterative translations and their parses for I would like to go: tl-0: ME GUSTARÍA QUE IR tree-0: ((S,0 (VP,15 (V,7:2 'ME GUSTARÍA') (PP,2 (PREP,1:4 'QUE') (VP,1 (V,8:5 'IR') ) ) ) ) ) No alternative matches the TL sentence: WOULD LIKE QUE IR tl-1: YO ME GUSTARÍA QUE IR tree-1: ((S,1 (NP,1 (PRON,1:1 'YO') ) (VP,15 (V,7:2 'ME GUSTARÍA') (PP,2 (PREP,1:4 'QUE') (VP,1 (V,8:5 'IR') ) ) ) ) ) No alternative matches the TL sentence: WOULD LIKE QUE IR - bill incorporated the system call to the CI.cpp. Now both correct CIs with and without spurious corrections are working: 2006-2-13-17-13-55-8336/1 and 6 2006-3-31-17-25-46-23762/1 and 6 2006-2-13-16-51-29-8333/6 is fine but 1 crashes further down problem with the logic of the RR: ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-2-13-16-51-29-8333/1 >! 8333-1RR.out both lexical entries exist... delta function should extract agr person... seg faults when calculating the delta function [read]->[leÍ] is not in the lexicon {V,9} V::V |: ["read"] -> ["leí"] ( (X1::Y1) ((x0 form) = read) ((x0 actform) = read) ((x0 tense) = past) ((y0 agr pers) = 1) ((y0 agr num) = sg) ) {V,10} V::V |: ["read"] -> ["leyó"] ( (X1::Y1) ((x0 form) = read) ((x0 actform) = read) ((x0 tense) = past) ((y0 agr pers) = 1) ((y0 agr num) = sg) ) - corrected bug in simulation-lexicon.trf (leyo, pers = 3!) - made sure that lexical entries are all in small caps, including accented characters!! Added: TLWord = StringUtils::StringToLower(Wi.value); to edit case, working now However, the AffectedRule is returning VP,1 and what we need is S,1... MAIN::Printing Tree extracted from Log File: ((S,1 (NP,1 (PRON,3:1 'ELLA')) (VP,1 (V,9:2 'LEÍ'))) Affected Rule are: 1 {VP,1} [leÍ] problem: without a clue word having been detected by user (ella), there is no way AffectedRules can return S,1 HACK: in this case, since there is only one other word, pick that as clue word. - bug in the parse tree, it has an extra parent and so it returns (S,1 and an id! -> asked Bill to add a check to ID.LoadFromString, so that if there is a parent, it deleted it. it's working now :) - bill implemented VariableInstantion for one variable, for MoveConstit testing it... ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-2-13-16-51-29-8333/9 after some debugging, it's working now :) it inputs 5 and returns 3 :-) ******************************************************************** Recap ****** #################### correct: 1, 6 edit: 0,2,3,8,9 add: 4, 5, delete: 7 cwo: 8,9 #################### ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-2-13-16-51-29-8333/ 0 -> didn't manage to get rid of ambiguity (still producing auto roja!) try loading the refined grammar outside the RRefiner and see what the problem is =c? 1 (purposefully picked wrong TL for testing) -> didn't manage to get rid of ambiguity (still producing ella lei!) try loading the refined grammar outside the RRefiner and see what the problem is =c? 2 OOW (LEXICAL ENTRY ADDED FROM SCRATCH) [done] 3 Added sense of the word (play->toca) [done] -> still need to get rid of *juega guitarra 4 Modified lexical entry cayeron -> se cayeron ambiguity necessarily increased. Still, lattice precision should be better 5 working on implementing methods that will allow me to add a new constit 6 OK 7 need to generate new CI with current lexicon, since right now it's not finding TL in lattice :( 8 ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-2-13-17-13-55-8336/8 >! 8RR.out Complex: 2 actions: edit + cwo te is created as a copy of tu -> feat_0 is postulated (should really be case, but RR cannot know that) -> but feat_0 doesn't get added to the lexical entries 9 implementing it... ******************************************************************** Friday, May 12, 2006 - debugged WordList2Lexicon.pl with Erik -> need to load it to memory, otherwise it will take too long. -> send email to Bill with final complete to do list 8 ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-2-13-17-13-55-8336/8 >! 8RR.out Complex: 2 actions: edit + cwo te is created as a copy of tu -> feat_0 is postulated (should really be case, but RR cannot know that) -> but feat_0 doesn't get added to the lexical entries -> need to create a new CI for 7 with current lexicon (TLS=me gustaria que ir) also for 1: pick ella lei and then correct to leyo and clue word = ella -> test RR! Monday, May 15, 2006 - TCTool: cp input-tct-4RRExamples input-tct added alternative translations to 7 (I would like to go) to input-tct 2006-5-15-13-08-25-11387 need to copy input-tct-US2 back into input-tct after I'm done with the RR examples Testing 1 and 7: ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08-25-11387/1 >! 11387-1RR.out New agreement constraint created is (y1 agr pers) = (y2 agr pers) Added to the rule... {S,91} S::S : [NP VP] -> [NP VP] ( ;(P:{S,1}) (X1::Y1) (X2::Y2) (x0 = x2) ((y1 case) = nom) ((y1 agr) = (x1 agr)) ((y2 tense) = (x2 tense)) ((y1 agr pers) = (y2 agr pers)) ) **************************************************************************** ***The refined grammar and lexicon produced the user corrected translation*** The correct translation is: ELLA LEYÓ **************************************************************************** **************************************************************************** ***However it is still producing the incorrect translation, previously corrected*** by the user: ELLA LEÍ **************************************************************************** it seg faults!!! -> debug -> need to figure out why the constraint does not prevent "ella lei" from generating ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08-25-11387/7 >! 11387-7RR.out tree-0: ((S,0 (VP,15 (V,7:2 'ME GUSTARÍA') (PP,2 (PREP,1:4 'QUE') (VP,1 (V,8:5 'IR') ) ) ) ) ) Before action: SLWords: I would like to go TempCTLWords (1st time = TLWords): ME GUSTARÍA QUE IR Alignments: ((1,1),(2,1),(4,2)) /* case DELETE: { cout << "Action type = delete\n"; ActionDeleteWord *pDeleteWord = (ActionDeleteWord*)pAct; cout << "Wi: [" << pDeleteWord->GetDeletedWord() << "]\n" << "i is " << pDeleteWord->GetPosDelete() << endl; */ Action type = delete Wi: [IR] i is 3 Affected Rule are: 0 When storing TLWords, one item per position, the fact that two or more words are part of the same lexical entry is not reflected... and so when tr the accessors count positions indicated by alignments, but the CI class stores positions per word item, instead of entries... 1."me gustaria" 2.que 3.ir 1. me 2. gustaria 3.que 4.ir Did all the alignment changes first and then: ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-42-35-31799/0 >! 31799-7RR.out Before action: SLWords: I would like to go TempCTLWords (1st time = TLWords): ME GUSTARÍA QUE IR Alignments: ((1,1),(2,2),(3,2),(4,2),(5,3),(5,4)) Action type = delete Wi: [IR] i is 3 Not touching alignments: ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-47-21-13849/0 >! 13849-7RR.out still same :( -> asked bill to check if it could be that the TCTOOLPOS_TO_VECTOR, and indeed this is what it was, he's fixing it :) Before action: SLWords: I would like to go TempCTLWords (1st time = TLWords): ME GUSTARÍA QUE IR Alignments: ((2,1),(5,4)) Action type = delete Wi: [QUE] i is 3 Ok, now I just need to implement the rest of the delete case algortihm ------------------ Artificially adding (( =c +) constraint to edit case (example 1) , just to make sure it's working [aria@avenue V0.05]$ ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08-25-11387/1 > ! 1-temp-RR.out New value constraint created is (y1 agr pers) =c + Added to the rule... {S,91} S::S : [NP VP] -> [NP VP] ( ;(P:{S,1}) (X1::Y1) (X2::Y2) (x0 = x2) ((y1 case) = nom) ((y1 agr) = (x1 agr)) ((y2 tense) = (x2 tense)) ((y1 agr pers) = (y2 agr pers)) ((y1 agr pers) =c +) ) commenting out that part of the code for now Bill's implemented the following methods, tesing: - GetOriginalRule(pRefinedRule*) Given a previously derived rule, it returns a pointer to the original rule that got bifurcated into it. If it’s not a derived rule, it should return NULL. - vpRules GetDerivedRules(pOriginalRule*) - Rule comparison method: bool SameRuleExcepptFeatName(pDerivedRule1, pDerivedRule2, &sFeatNameDRule1, &sFeatNameDRule2); * Tested case when there is only one derived rule, and it's working * Tested case when there are multiple derived rules, but they are not identical to the newly added rule, it's working * Tested case when there are multiple derived rules, and the newly added has an identical rule in the history, after some debugging, it's working Checking if they are really the same or not Identical Rule found in the grammar, with different feature name: Feat1 is feat_0 and Feat2 is feat_1 {S,93} ------------------------------------------------------------ Bill will be working on Spurious Loop and Error complexity and finish at least one of the two tasks a couple of weeks from now. -------------------------------------------------------------- still need to test: ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08-25-11387/0 >! 0RR.out - bool LexicalEntry::ReplaceFeatName(sFeatName1, sFeatName2); - RuleCollection::DecreaseFeatNameCounter. - VariableInstantiation for one position when adding a word to a GraRule, the POS of the following word should be skipped (fLookatLeafPOS=false), since the method needs to retrieve the parent node of the next word (Leaf). %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% - Error complexity. Not sure what is currently implemented, but for now, having a simplified version would probably be fine, since I am planning to deal with sentences containing 2 or 3 examples. Sentence containing independent errors first, sentences containing dependent errors last in the ranking. So I guess error dependency should have a weight of 3 for now, and each error a weight of 1. -> add 1 per correction to the score if it's independent to previous errors, and 3 per correction if it's dependent I'll look into it, and then figure out what exaclty needs to be implemented, and will email Bill - Reverse Refinement(s) I haven’t had time to look into this, but it would be great if we could look at the time stamp management before you leave, so that if rule does not result into an improvement on a test set (T2), there is a good way to reverse to the previous version of the grammar (T1). As I told you before (and maybe it’s already implemented), it would be useful to have a variable that expresses whether a rule lead to improvement or not (bool Rule.ImprovedAccuracy()). Maybe this is too complex to do before you leave, but maybe you have already implemented most of what would be needed, and it would be fairly simple. In any case, I’d like to know. - Translation Pairs annotation to Rules Even though I still need to create a TrPair file with the right annotation for this, it would be great if the set up to check if any rule has TrPairs associated with it is already in place. It could probably be something as simple as a vector of ints, each int indicating the line in a text file that contains the appropriate TrPair. So that if the vector is empty, there is no support for that rule, and if it’s not, there is and thus should not be deactivated after being refined. Finally, something I asked you to do at the very beginning, which you might very well have already implemented, but I just wan to double check, is to have an easy way to group CIs according to user that generated it (namely directory name). As far as I know, CIs are currently indexed by SL-TL and SL-TL-CTL (in order to pick BestCI), but in addition to that, it would be good to have a way to back trace any refinement (rule or lex entry) to what user(s) made the correction that lead to it. Since my understanding is that CICollection is merging similar CIs and stores the different user IDs into a vector (is this right?), then I believe all that would be needed is a way to store that info as a comment to the rule derived from that CI. What do you think? **************************************************************************** **************************************************************************** Bill left for the Summer ***************************************************************************** Tuesday, May 16 2006 - debugging WordList2Lexicon.pl, it's working it just takes a loooooooong time. Added progress tracker and formatted features so that they are in the lexicon format + outputting multiple entries per inflected word (ex: paso) ./WordList2Lexicon.pl < ../eng2spa/lexicons/MM-WordList/MM-TranslationsSortNoDup.txt >! ../eng2spa/lexicons/LexiconMM.trf currently, lexical entries printed out by POS in MM lexicon if I want to print them out organized by their real pos from maco-girat, need to store them again into a hash and after having read all the MM lexicon, print it -> actually, since I don't have POS list for MM, I should just organize it by pos from maco-girat! - moved to V0.06 and cleaned it up (deleted old files) - Created ExampleSentenceOutput to keep track of what is already working - backed it up in Avenue (afs) and temuco:/usr4/ Wednesday, May 17, 2006 - killed WordList2Lexicon.pl, since it was still running and taking all the space!!! [aria@avenue lexicons]$ wc -l LexiconMM.trf 596737820 LexiconMM.trf Finished reading Maco-girat file Inflected forms: 920994 mapping PAROLE Tags now... ;;************V************ ;;************N************ ;;************ADCONJ************ ;;************PRON************ ;;************INTERJ************ ;;************J************ value from hash: [] getFS::CitationFeatures is [] ;;************V************ ;;************N************ value from hash: [] getFS::CitationFeatures is [] ;;************ADCONJ************ ;;************PRON************ ;;************INTERJ************ ;;************J************ value from hash: [] getFS::CitationFeatures is [] ;;************V************ ;;************N************ value from hash: [] getFS::CitationFeatures is [] value from hash: [] getFS::CitationFeatures is [] ;;************ADCONJ************ ;;************PRON************ ;;************INTERJ************ ;;************J************ value from hash: [] getFS::CitationFeatures is [] ;;************V************ ;;************N************ .... deleted it Thursday, May 18, 2006 - moved WordList2Lexicon.pl to temuco:/usr4/aria/bin MM-TranslationsSortNoDup.txt WordList2Lexicon.pl debugging... WordList2Lexicon.pl < MM-TranslationsSortNoDup.txt - continue working on RuleRefiner: ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08-25-11387/0 >! 0RR.out * RuleCollection::DecreaseFeatNameCounter works :) - created high level methods for setting constraints (moved code from main to the RRConstraint class): pConstr->SetAgrConstraint(EType,TriggerFeat,TriggerFeat,CluePOSPos, POSPos); pValConstr->SetValueConstraint(EType, TriggerFeat, CluePOSPos, sValue); debugged and tested, it's working :) - Adding discriminating feature to both original and refined lexical entries (l. 823) [example sentence 3] Postulating New Feature: feat_0 New value constraint created is (y0 feat_0) = + Added to refined lexical entry... {V,11} V::V |: ["plays"] -> ["toca"] ( ;(P:{V,5}) (X1::Y1) ((x0 form) = play) ((x0 actform) = play) ((x0 tense) = pres) ((y0 agr pers) = 3) ((y0 agr num) = sg) ((y0 feat_0) = +) ) Blocking constraint created is (y0 feat_0) = - Added to original lexical entry... {V,11} V::V |: ["plays"] -> ["toca"] ( ;(P:{V,5}) (X1::Y1) ((x0 form) = play) ((x0 actform) = play) ((x0 tense) = pres) ((y0 agr pers) = 3) ((y0 agr num) = sg) ((y0 feat_0) = +) Calculating Delta Function for juega and toca ... The triggering feature is [feat_0] -> finish to implement example 3 add feat0 = + to clue word!!! and then check xfer output, juega guitarra should not be produced now # debugging adding value constraint to clue word (ex: 3) Friday, May 19, 2006 - ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08-25-11387/3 >! 3RR.out added Val constr to clue word, and agreement constraint to the Affected Rule: WiPOSPos is: [1] and CluePOSPos is: [2] Bifurcated {VP,2} Need to create agreement constraint with feat_0 and add constraint to rule New value constraint created is (y2 feat_0) = (y1 feat_0) Added to the rule... {VP,47} VP::VP : [VP NP] -> [VP NP] ( ;(P:{VP,2}) (X1::Y1) (X2::Y2) ((x2 case) = acc) ((x0 obj) = x2) ((x0 agr) = (x1 agr)) (x0 = x1) ((y0 tense) = (x0 tense)) ((y0 agr) = (y1 agr)) ((y2 feat_0) = (y1 feat_0)) ) However, I need to percolate the feature up from the lexicon, all the way to the VP,2 rule, namely, it also needs to be added to NP,3 for position 2 (N) ((S,1 (NP,2 (N,3:1 'MARÍA')) (VP,2 (VP,1 (V,5:2 'JUEGA')) (NP,3 (DET,2:3 'LA') (N,5:4 'GUITARRA')))) May 23, 2006 - looked at why refined grammar wasn't successfully reducing ambiguity and realized that reloadgra is not doing the right thing, since when loading the grammar with the load command, it does the right thing. -> email Erik May 24, 2006 it turns out, reloadgra is doing the right thing, however, the original grammar, namely the original rule which the refined grammar has deactivated by commenting it out, is not effectively deactivated when using reloadgra, so I need to actually clear all the rules and then load the grammar from scratch. -> emailed erik to find out if there is a method to clear rules in the G or the L, as opposed to all the rules. May 26, 2006 - ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08-25-11387/3 >! 3RR.out - I can now extract clueword entry from lexicon - working on add constit case: fixed the logic of lexical refinements vs grammar refinements May 30, 2006 trying with a simpler log file: ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-3-31-17-35-31-23763/5 >! 5RR.out same problem. However, when I run it on ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08-25-11387/4, it doesn't freeze there, it proceeds until the end of the program.. so it works for 4 but does NOT work for 5, and I have no idea why!!! -> debugging 4, since there seems to be somehting weird with the UseRuleIDManager() when creating the new lexical entry and then when loading the refined lexicon to the Xfer engine, it appears empty! (even though all this was working before...) [aria@avenue V0.06]$ ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08-25-11387/4 > ! 4RR.out line 1:1: unexpected end of file Segmentation fault since I had add to the logic of the ADD case, it needed some adjusting and bug fixing, now it's working - now 5 is also working, or at least it gets to the grammar refinements :-) - two bugs in bill's code: literal (a) does not have "" and constraint indices are not updated when calling pNewGraRule->AddConstituentToRHS(pConstit, iPosAdded) // probably the best place to add "" is in the AddConstituentToRHS, but there you might not know if it's a literal... a -> "a" Wednesday, May 31, 2006 - finished add constit logic, need to debug when constit and add constit bugs are fixed - working on cwo case with sentence 9 (8 is too complex, leave for later) LoadCI is still chocking... - sent email to Bill with bugs and fixes (Rule Refiner stuff also stored in bill/info/email-24-FollowUp1) - backed up V0.06 to Avenue afs and temuco /usr4/ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Working on writing and presenting papers, Barcelona + DiagnosticTestSet % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Thursday, July 20, 2006 - Right now even though I could process with multiple corrections with the CICollection (once it's been debugged anyway), every time I see a correction I take the original grammar and refine it, save it. -> Need to make sure that the next iteration to the CICollection builds on the already refined grammar, so that I can incrementatlly refine the grammar, even though the code is still not in place to revert grammar changes and so on. Original code is in CICollection/TestCICollection need to integrate Tuesday, July 25, 2006 - looked at how to integrate CICollection code to my main - dirs.txt is the file that contains all the directories that need to be loaded to a CI Collection. Edited it to contain /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08-25-11387 and then run TestCICollection on it: ./TestCICollection works up to when it comes to outpting error complexity - integrating code into main: first read in all the log files and store into the CICollection then, traverse the collection and process each CI (for now in order they have been stored) -> later test BestCI code make RR ./RuleRefinement - debugging... ./RuleRefinement > RR.out Thursday, July 27, 2006 - realized I was running an old version, since the current Makefile was generating RuleRefinement.exe instead of RuleRefinement Friday, July 28, 2006 - looking into AddCI, there is both CICollection and CICollectionIndex, since bill keeps two two in separate DS. Need to be careful with this. - currenlty program chockes when reading in /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08-25-11387/8 TL -- SL seem to be empty, even though it can extract the right tree somehow: File : /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08-25-11387/8 MAIN::The SL and TL for the LogFile given as a param are: XferEngine::ExtractParseTreeFromLattice: TL sentence is [VISTE TÚ these are the alterative translations and their parses for I saw you: tree-0: ((S,0 (VP,46 (V,4:2 'VISTE') (NP,1 (PRON,2:3 'TÚ) ) ) ) ) MAIN::Printing Tree extracted from Xfer engine for SL-TL pair in the logfile: ((S,0 (VP,46 (V,4:2 'VISTE') (NP,1 Segmentation fault !!!! Noticed that log files' format is not consistent !!! /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08-25-11387/ The TL sentence in 1 and 8 ends up with a stressed character but no closing "!! Not sure why this is, and the RR did not chocke when processing 1, but still! for now editing 8 and adding closing " to see if that makes a difference ah it seems it's a problem with the new version of UNIX less!! when I opened it with emacs, it looks fine, weird.. !!!!!!! NEED TO LOOK AT THE OUTPUT FILE IN EMACS AND NOT LESS !!!! it has nothing to do with the formating of the log files, RR chockes when printing the tree for tree-0: ((S,0 (VP,46 (V,4:2 'VISTE') (NP,1 (PRON,2:3 'TÚ') ) ) ) ) TL is one of the alternatives: VISTE TÚ MAIN::Printing Tree extracted from Xfer engine for SL-TL pair in the logfile: ((S,0 (VP,46 (V,4:2 'VISTE') (NP,1 For now: mv 8 99 then first file it processes is 99!, so mv 99 ../8 Now it's working! (need to figure out why the TreePrinting function chockes on 8) 0: Adding (agr gen) constraint happens between position 2 and 1, instead of 3 and 2... looked at when the RR extracts positions and saw that it's due to the fact that the tree stored in order to extract the POS positions is not the right one, but rather the last one it got instantiated for the gaudi example. The tree needs to be stored in each CI at run time, since taking the original trees output by the system would not reflect recent grammar/lexical refinements, so we don't really save any processing time by doing that. -> Modified LoadTCToolLogFile to instanciate the tree data member and added a GetTree funtion to CI that returns the tree. (I tried returning a pointer to it, instead, but i couldn't debug it, so left it like this. Before doing anything, for each new CI that is extracted from the collection to do the refienements, I first GetTree from it and instantiate it with tree, since lots of code relies on it afterwards... Working fine now. 0: refinement succcessful 1: In any case, for the refinements in 1 to be effective, it needs to percolate to the NP and VP, and this method is still not implemented... Everything ok until the point when it loads the refined grammar... Deleting all loaded rules. Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon-ID-REFINED.trf with 34 rules added Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-ID-REFINED.trf with ;F0 is the feature counter that tells the grammar and the lexicon what feature name to use next [aria@avenue V0.07]$ ./RuleRefinement >! RR.out Unable to open /usr0/aria/eng2spa/lexicons/simulation-lexicon-REFINED-1.trf. No parse found. LEX 0 GRA 0 UNK 0 MORPH 0 COMP 0 No parse found. LEX 0 GRA 0 UNK 0 MORPH 0 COMP 0 Unable to open /usr0/aria/eng2spa/lexicons/simulation-lexicon-REFINED-2.trf. No parse found. LEX 0 GRA 0 UNK 0 MORPH 0 COMP 0 No parse found. LEX 0 GRA 0 UNK 0 MORPH 0 COMP 0 Unable to open /usr0/aria/eng2spa/lexicons/simulation-lexicon-REFINED-4.trf. Segmentation fault for some reason lexicon-R-1 is empty, need to debug. EDIT CASE lexicon is not printed to screen but it should! using realod command there was a logic problem, only on one of the else clauses the lexicon got printed to a file, and since all the rules are cleared before re-runing the xfer engine, to get rid of commented out rules, it needs to be reprinted even when no refinement is done - 2: OOWV -> xfer did not find a full parse!!! oops TL sentence is [VEO EL UNICORN ROJO] CTL sentence is [VEO EL UNICORNIO ROJO] No full parse was found! Partial parse is: VEO EL ROJA UNICORN tree: <((S,0 (VP,1 (V,1:2 'VEO') ) ) )> <(DET,1:3 'EL')> <(ADJ,1:4 'ROJA')> <(UNK,0:5 'UNICORN')> 3: the grammar does not load (this is output by the Xfer engine itself when the clearall command is called) Deleting all loaded rules. Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon-REFINED-4.trf with 35 rules added Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED-4.trf with -> look into it! /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED-4.trf has all the refinements so far integrated in it, yeahh!!! Monday, July 31, 2006 - ! first parse tree retrieved by RR might not be the best one... (example in AMTA 2006 paper included VP,46, which is an automatically learned rule that had a incorrect generalizations... - debugging RR, why it seg faults when trying to load the refined grammar? I looked at both simulation-grammar-REFINED-4.trf and simulation-grammar-REFINED-2.trf and I didn't see anything that would make the last one load ok and the first one not load ok... - running the Xfer engine directly with simulation-grammar-REFINED-4.trf was fine cd /usr0/aria/eng2spa transfer -if init-simulation.txt Turning on Latin-1 mode Setting normalizecase to UPPER-CASE Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED-4.trf with 15 rules added Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon-REFINED-4.trf with 35 rules added Setting find all translations to ON Setting output source text to ON Setting showtrace to full trace with src indices Translating file: /usr0/aria/eng2spa/corpus/error-typology-simulation 1: Translating "I see the red car". 2: Translating "she read". 3: Translating "I see the red unicorn". 4: Translating "Mary plays the guitar". 5: Translating "John and Mary fell". 6: Translating "you saw the woman". 7: Translating "they see water". 8: Translating "I would like to go". 9: Translating "I saw you". 10: Translating "Gaudi was a great artist". TOT 0 LEX 0 GRA 0 UNK 0 MORPH 0 COMP 0 COV nan UNKNOWNS LIKE 2 [aria@avenue eng2spa]$ emacs corpus/error-typology-simulation.out.debug And it produced the right result: 3: sl: Mary plays the guitar tl: MARÍA JUEGA LA GUITARRA tree: <((S,91 (NP,2 (N,3:1 'MARÍA') ) (VP,47 (VP,1 (V,5:2 'JUEGA') ) (NP,3 (DET,2:3 'LA') (N,5:4 'GUITARRA') ) ) ) )> tl: MARÍA TOCA LA GUITARRA tree: <((S,91 (NP,2 (N,3:1 'MARÍA') ) (VP,47 (VP,1 (V,11:2 'TOCA') ) (NP,3 (DET,2:3 'LA') (N,5:4 'GUITARRA') ) ) ) )> Even when I replicate what the RR does, first removing all the rules and then loading the lexicon and the grammar, there seems to be no problem: [aria@avenue eng2spa]$ transfer Welcome to the AVENUE Transfer Engine. > loadrules /usr0/aria/eng2spa/lexicons/simulation-lexicon-REFINED-2.trf Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon-REFINED-2.trf with 33 rules added > loadrules /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED-2.trf Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED-2.trf with 15 rules added > reloadgra /usr0/aria/eng2spa/lexicons/simulation-lexicon-REFINED-3.trf Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon-REFINED-3.trf with 1 rules added > clearall Deleting all loaded rules. > loadrules /usr0/aria/eng2spa/lexicons/simulation-lexicon-REFINED-4.trf Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon-REFINED-4.trf with 35 rules added > loadrules /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED-4.trf Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED-4.trf with 15 rules added > exit the Xfer engine wrapper funtions for LoadGra and LoadLex were using loadgra and loadlex instead of loadrules -> changed. reloadrules is not a valid command, so leaving reloadgra for both reloadLex and Gra. looking at the code that actually loads G and L: RR.cpp: l. 1570 pXfer->RemoveAllLoadRules(); pXfer->LoadLex(pRefinedLexFileName); pXfer->LoadGra(pRefinedGraFileName); XferEngine.cpp: void XferEngine::LoadGra(const char* pGramFileName) { string Command1 = string ("loadrules ") + pGramFileName; xfer->processCommand(Command1); } void XferEngine::LoadLex(const char* pLexFileName) { string Command2 = string ("loadrules ") + pLexFileName; xfer->processCommand(Command2); } void XferEngine::RemoveAllLoadRules() { xfer->processCommand("clearall"); } // for now they are both calling reloadgra, since reloadlex is not implemented yet void XferEngine::ReLoadLex(const char* pRefinedLexFileName) { string Command3 = string ("reloadgra ") + pRefinedLexFileName; xfer->processCommand(Command3); } And it does'nt look like a formatting problem either: [aria@avenue grammars]$ diff simulation-grammar-REFINED-2.trf simulation-grammar-REFINED-4.trf 1c1 < ;F:0 --- > ;F:1 121,131c121,132 < {VP,2} < VP::VP : [VP NP] -> [VP NP] < ( < (X1::Y1) (X2::Y2) < ((x2 case) = acc) < ((x0 obj) = x2) < ((x0 agr) = (x1 agr)) < (x0 = x1) < ((y0 tense) = (x0 tense)) < ((y0 agr) = (y1 agr)) < ) --- > ;D: > ;{VP,2} > ;VP::VP : [VP NP] -> [VP NP] > ;( > ; (X1::Y1) (X2::Y2) > ; ((x2 case) = acc) > ; ((x0 obj) = x2) > ; ((x0 agr) = (x1 agr)) > ; (x0 = x1) > ; ((y0 tense) = (x0 tense)) > ; ((y0 agr) = (y1 agr)) > ;) 158a160,172 > ) > {VP,47} > VP::VP : [VP NP] -> [VP NP] > ( > ;(P:{VP,2}) > (X1::Y1) (X2::Y2) > ((x2 case) = acc) > ((x0 obj) = x2) > ((x0 agr) = (x1 agr)) > (x0 = x1) > ((y0 tense) = (x0 tense)) > ((y0 agr) = (y1 agr)) > ((y1 feat_0) = (y2 feat_0)) double checked parents, and they seem to be fine... -> emailed Erik, in case he knows why this is happening Tuesday, August 1, 2006 - renamed 3 to 1 and 1 to 3 and rerun RR, see what happens... [aria@avenue IOFiles]$ cd 2006-5-15-13-08-25-11387/ [aria@avenue 2006-5-15-13-08-25-11387]$ mv 1 temp [aria@avenue 2006-5-15-13-08-25-11387]$ mv 3 1 [aria@avenue 2006-5-15-13-08-25-11387]$ mv 1 3 (should have been mv temp 3) [aria@avenue 2006-5-15-13-08-25-11387]$ it worked!!! Debugging here Deleting all loaded rules. Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon-REFINED-4.trf with 35 rules added Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED-4.trf with 15 rules added Checking refined lattice for the presence of CTL and TL sentences XferEngine::CheckRefinedLattice: Checking if CTL and TL sentences are in the Refined Lattice... TL sentence is [MARÍA JUEGA LA GUITARRA] CTL sentence is [MARÍA TOCA LA GUITARRA] these are the alterative translations for Mary plays the guitar : tl-0: MARÍA JUEGA LA GUITARRA tree-0: ((S,91 (NP,2 (N,3:1 'MARÍA') ) (VP,47 (VP,1 (V,5:2 'JUEGA') ) (NP,3 (DET,2:3 'LA') (N,5:4 'GUITARRA') ) ) ) ) tl-1: MARÍA TOCA LA GUITARRA tree-1: ((S,91 (NP,2 (N,3:1 'MARÍA') ) (VP,47 (VP,1 (V,11:2 'TOCA') ) (NP,3 (DET,2:3 'LA') (N,5:4 'GUITARRA') ) ) ) ) **************************************************************************** ***The refined grammar and lexicon produced the user corrected translation*** The correct translation is: MARÍA TOCA LA GUITARRA **************************************************************************** **************************************************************************** ***However it is still producing the incorrect translation, previously corrected*** by the user: MARÍA JUEGA LA GUITARRA **************************************************************************** ********************************************************* Refinement was successfull, but lexical ambiguity increased ************************************************************* And it's working for: 0. I see the red car 2. I see the red unicorn 3. Mary plays the guitar 4. John and Mary fell oops, deleted 1 by mistake (she read - ella leyi -> leyo) -> need to implement percolate first anyway but not for: 5. you saw the woman "fell" still seems to be instantiated to the SLWord, and so it created a lex entry fell->a!!! ************************************************** After all actions in CI: SLWords: John and Mary fell TempCTLWords (1st time = TLWords): juan y marÍa se cayeron Alignments: ((1,1),(2,2),(3,3),(4,5),(4,4)) ************************************************** ((S,0 (VP,46 (V,4:2 'VISTE') (NP,3 (DET,2:3 'LA') (N,4:4 'MUJER')))) MAIN::CI's CTL sentence is instantiated with [viste a la mujer ] XferEngine::TLInLattice: Checking if TL sentence is in the Lattice... TL sentence is [VISTE A LA MUJER] MAIN::pXfer->TLInLattice: no, the CTL sentence is NOT in the lattice Affected Rule are: 2 {VP,46} ['VISTE'] {NP,3} ['LA'] Before action: SLWords: you saw the woman TempCTLWords (1st time = TLWords): viste la mujer Alignments: ((2,1),(3,2),(4,3)) Action type = add Wi': [a] i'...: 2 Looking ahead to find relevant alignments... Other action type: 5 Discarding for now. End of look ahead Applying correction action to TLWord SLWordPos is 4 If pLexEntry != NULL {V,13} V::V |: ["fell"] -> ["a"] ( ;(P:{V,6}) (X1::Y1) ((x0 form) = fall) ((x0 actform) = fell) ((x0 tense) = past) ((y0 agr pers) = 3) ((y0 agr num) = pl) ((y0 type) = refl) ) ********************************* ... Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon-REFINED-6.trf with 1 rules added Segmentation fault (the run before it seg faulted after having loaded both grammar and lexicon and it seg faulted after printing the tree for "me gustaria que ir" -> there is a bug in the code that adds a word at the grammar level following another add case at the lexical level... + se: fell -> se cayeron + a -> no alignments -> no SLword!!! debugging alignment code (look ahead, etc.): Added to l. 572: SLWordPos = -1; Changed l. 626 from: if ( SLWordPos > 0 ) to: if ( SLWordPos >= 0 ) something worked, but need to further debug ... Looking ahead to find relevant alignments... pAct2->GetType() is 5 (CLEAR_ALIGNMENT) Other action type: 5 Discarding for now. End of look ahead %%%%%%%%%%%%%%%%%% From CI.hpp: enum ACTION_TYPE { ADD = 0, EDIT,//1 DELETE, //2 CHANGE_WORD_ORDER, //3 ADD_ALIGNMENT, //4 CLEAR_ALIGNMENT //5 }; %%%%%%%%%%%%%%%%%%% Applying correction action to TLWord SLWordPos is -1 There are no alignments from Wi' to SLWords -> Grammar Refinement There are 2 Affected Rules {VP,46} ['VISTE'] {NP,3} ['LA'] Original Rule is: {NP,3} NP::NP : [DET N] -> [DET N] ( (X1::Y1) (X2::Y2) (x0 = x2) ((y1 def) = (x1 def)) ((y2 agr) = (x2 agr)) ((y1 agr) = (y2 agr)) ) Bifurcated {NP,3} the position where "a" needs to be added to is -1 {NP,10} NP::NP : [DET N] -> [a DET N] ( ;(P:{NP,3}) (X1::Y2) (X2::Y3) (x0 = x2) ((y1 def) = (x1 def)) ((y2 agr) = (x2 agr)) ((y1 agr) = (y2 agr)) ) Added new rule to grammar... Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED-6.trf with 1 rules added but now is seg faulting after having printed the tree for "viste la mujer" :( ((S,0 (VP,46 (V,4:2 'VISTE') (NP,3 (DET,2:3 'LA') (N,4:4 'MUJER')))) Tried processing log file 5 before 4: [aria@avenue 2006-5-15-13-08-25-11387]$ mv 4 temp [aria@avenue 2006-5-15-13-08-25-11387]$ mv 5 4 [aria@avenue 2006-5-15-13-08-25-11387]$ mv temp 5 [aria@avenue 2006-5-15-13-08-25-11387]$ the RR processes file not in alha-numerical order!!! 0,4,2,3,5,6,7,9 not sure why -> Bill? There seems to be bug in RuleInstantiation... int iPosAdded; // last argument indicates that added word is not already in tree pNewGraRule->RuleInstantiation(TLWordPos, &tree, iPosAdded, false); cout << "the position where \"" << TLWord << "\" needs to be added to is " << iPosAdded << endl; ... Applying correction action to TLWord If there is at least one alignment to Wi (SLWordPos >= 0) SLWordPos is -1 There are no alignments from Wi' to SLWords -> Grammar Refinement There are 2 Affected Rules {VP,46} ['VISTE'] {NP,3} ['LA'] Original Rule is: {NP,3} NP::NP : [DET N] -> [DET N] ( (X1::Y1) (X2::Y2) (x0 = x2) ((y1 def) = (x1 def)) ((y2 agr) = (x2 agr)) ((y1 agr) = (y2 agr)) ) Bifurcated {NP,3} the position where "a" needs to be added to is -1 {NP,10} NP::NP : [DET N] -> [a DET N] ( ;(P:{NP,3}) (X1::Y2) (X2::Y3) (x0 = x2) ((y1 def) = (x1 def)) ((y2 agr) = (x2 agr)) ((y1 agr) = (y2 agr)) ) Added new rule to grammar... Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED-2.trf with 1 rules added Checking refined lattice for the presence of CTL and TL sentences XferEngine::CheckRefinedLattice: Checking if CTL and TL sentences are in the Refined Lattice... TL sentence is [VISTE LA MUJER] CTL sentence is [VISTE A LA MUJER] these are the alterative translations for you saw the woman : tl-0: VISTE LA MUJER tree-0: ((S,0 (VP,46 (V,4:2 'VISTE') (NP,3 (DET,2:3 'LA') (N,4:4 'MUJER') ) ) ) ) tl-1: TÚ VISTE LA MUJER tree-1: ((S,1 (NP,1 (PRON,2:1 'TÚ') ) (VP,46 (V,4:2 'VISTE') (NP,3 (DET,2:3 'LA') (N,4:4 'MUJER') ) ) ) ) tl-2: VISTE LA MUJER tree-2: ((S,0 (VP,2 (VP,1 (V,4:2 'VISTE') ) (NP,3 (DET,2:3 'LA') (N,4:4 'MUJER') ) ) ) ) tl-3: TÚ VISTE LA MUJER tree-3: ((S,1 (NP,1 (PRON,2:1 'TÚ') ) (VP,2 (VP,1 (V,4:2 'VISTE') ) (NP,3 (DET,2:3 'LA') (N,4:4 'MUJER') ) ) ) ) tl-4: TÚ VISTE LA MUJER tree-4: ((S,90 (NP,1 (PRON,2:1 'TÚ') ) (VP,46 (V,4:2 'VISTE') (NP,3 (DET,2:3 'LA') (N,4:4 'MUJER') ) ) ) ) tl-5: TÚ VISTE LA MUJER tree-5: ((S,90 (NP,1 (PRON,2:1 'TÚ') ) (VP,2 (VP,1 (V,4:2 'VISTE') ) (NP,3 (DET,2:3 'LA') (N,4:4 'MUJER') ) ) ) ) *********************************************************** Refinement did not work, need to revise manually ************************************************************* But now it seg faults on a different log file (when loading G and L for Mary plays the guitar). It crashes after having processed the 4th refinement Wednesday, August 2, 2006 - there seems to be a problem with the max number of log files that the RR can process before it seg faults... could this be a memory problem? -> try running on barrow Needed to change permissions on avenue to make partition /usr0 writable as well as readable (Ralf). [aria@barrow ~]$ /avenue/usr0/aria/RuleRefinement/V0.07/RuleRefinement > ! /avenue/usr0/aria/RuleRefinement/V0.07/RR.out However, it does't seem to finish by itself and it's only outputting the beginning: Compiled on Aug 1 2006 15:28:57 with g++ version: 3.2.2 20030222 (Red Hat Linux 3.2.2-5) in debug mode Parameters are: Debug Level = 2 Lexicon File = /usr0/aria/eng2spa/lexicons/simulation-lexicon.trf Grammar File = /usr0/aria/eng2spa/grammars/simulation-grammar.trf weird, since it's supposed to be faster, not slower... Ralf thinks that because barrow has a 64-bit processor instead of a 32-bit one, that might be causing some problems, and it is hard to predict why it is not running well on barrow, but there could be a number of things. - tried adding getchar() in my code so that the program stops before crashing and doesn't continue until I type in some character. ******************************************************************* - Debugging session with Ralf: Makefile: edited Makefile and added the -ggdb3 to the DEBUG line for ******** full debugging DEBUG = -g -ggdb3 # outputs just serious errors GDB: started gdb (run, break function_name | file_name: line-num, up, **** down, print, ...). Open emacs, M-x gdb [enter] executable_name (RuleRefinement)[enter] run up print CTLS VALGRIND: ******** downloaded, compiled and installed valgrind, which is a program to check memory leaks (/usr0/aria/RuleRefinement/bin) added it to the V0.07 path [aria@avenue V0.07]$ ../bin/valgrind-3.2.0/coregrind/valgrind --leak-check=full --show-reachable=yes ./RuleRefinement > & ! OutputValgrind TAGS **** M-x visit-tags-table M-x find-tag: FileName|FuntionName ******************************************************************* - Ralf thinks that GetCI is not properly instantiated and it tries to access a NULL pointer, not sure why... need to look into it. CorrectionInstance *CIVector::GetCI(int i) const { if (i >= 0 && i < size()) return (*this)[i]; else return NULL; } gdb: ... break 'CorrectionInstance::GetCTLSentence()' Function "CorrectionInstance::GetCTLSentence()" not defined. (gdb) up #1 0x400acf4c in std::string::string(std::string const&) () from /usr/lib/libstdc++.so.5 (gdb) up #2 0x080622dc in CorrectionInstance::GetCTLSentence() const (this=0x826c130) at CorrectionInstance.cpp:1664 (gdb) #3 0x0804eccb in main (argc=1, argv=0xbffff974) at RuleRefinement.cpp:338 warning: Source file is more recent than executable. (gdb) down #2 0x080622dc in CorrectionInstance::GetCTLSentence() const (this=0x826c130) at CorrectionInstance.cpp:1664 (gdb) print *this $1 = { = {_vptr.RefCountedObject = 0x8270b18, m_cRef = 0}, Parse = {static npos = 4294967295, _M_dataplus = {> = {}, _M_p = 0x826c130 "\030\v'\b"}, static _S_empty_rep_storage = {0, 0, 0, 0}}, m_tree = {Tokenize = {}, Root = 0x8280e64, m_CStructure = {static npos = 4294967295, _M_dataplus = {> = {}, _M_p = 0x826c8c0 "(\204$\b"}, static _S_empty_rep_storage = {0, 0, 0, 0}}, m_vLeaves = {<_Vector_base >> = {<_Vector_alloc_base,true>> = {_M_start = 0x0, _M_finish = 0x52, _M_end_of_storage = 0x12}, }, }}, SLS = {static npos = 4294967295, _M_dataplus = {> = {}, _M_p = 0x12

}, static _S_empty_rep_storage = {0, 0, 0, 0}}, SLWords = {<_Vector_base >> = {<_Vector_alloc_base,true>> = {_M_start = 0x2, _M_finish = 0x20756f79, _M_end_of_storage = 0x20776173}, }, }, TLS = {static npos = 4294967295, _M_dataplus = {> = {}, _M_p = 0x20656874
}, static _S_empty_rep_storage = {0, 0, 0, 0}}, TLWords = {<_Vector_base >> = {<_Vector_alloc_base,true>> = {_M_start = 0x616d6f77, _M_finish = 0x206e, _M_end_of_storage = 0x826fd10}, }, }, CTLS = {static npos = 4294967295, _M_dataplus = {> = {}, _M_p = 0x0}, static _S_empty_rep_storage = {0, 0, 0, 0}}, CTLWords = {<_Vector_base >> = {<_Vector_alloc_base,true>> = {_M_start = 0x826c170, _M_finish = 0x826c170, _M_end_of_storage = 0x827d7fc}, }, }, Actions = {<_Vector_base >> = {<_Vector_alloc_base,true>> = {_M_start = 0x827c400, _M_finish = 0x0, _M_end_of_storage = 0x0}, }, }, m_IDs = {<_Vector_base, std::allocator >,std::allocator, std::allocator > > >> = {<_Vector_alloc_base, std::allocator >,std::allocator, std::allocator > >,true>> = { _M_start = 0x826b670, _M_finish = 0x0, _M_end_of_storage = 0x826c190}, }, }, m_fLeadToRefinement = 144, m_fIncreasedMTAccuracy = 193, m_fContainsDependentErrors = 38, m_fCDEDirty = 8, m_cNumNonAlignActions = 136775572} (gdb) _M_dataplus = {> = {}, _M_p = 0x0}, Undefined command: "". Try "help". Thursday, August 3, 2006 - created a file with this info/Debugging.txt - gdb is crashing after returning CTLS (CorrectionInstance::GetCTLSentence()) string CorrectionInstance::GetCTLSentence() const { return CTLS; } the interesting thing, is that before that it seems to have entered a loop for TempCTLWords (1st time = TLWords): ME GUSTARÍA QUE IR: (gdb): run ... Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon-REFINED-6.trf with 1 rules added Checking refined lattice for the presence of CTL and TL sentences XferEngine::CheckRefinedLattice: Checking if CTL and TL sentences are in the Refined Lattice... TL sentence is [JUAN Y MARÍA CAYERON] CTL sentence is [JUAN Y MARÍA SE CAYERON] these are the alterative translations for John and Mary fell : tl-0: JUAN Y MARÍA CAYERON tree-0: ((S,1 (NP,6 (NP,2 (N,2:1 'JUAN') ) (CONJ,1:2 'Y') (NP,2 (N,3:3 'MARÍA') ) ) (VP,1 (V,6:4 'CAYERON') ) ) ) tl-1: JUAN Y MARÍA SE CAYERON tree-1: ((S,1 (NP,6 (NP,2 (N,2:1 'JUAN') ) (CONJ,1:2 'Y') (NP,2 (N,3:3 'MARÍA') ) ) (VP,1 (V,12:4 'SE CAYERON') ) ) ) tl-2: JUAN Y MARÍA CAYERON tree-2: ((S,90 (NP,6 (NP,2 (N,2:1 'JUAN') ) (CONJ,1:2 'Y') (NP,2 (N,3:3 'MARÍA') ) ) (VP,1 (V,6:4 'CAYERON') ) ) ) tl-3: JUAN Y MARÍA SE CAYERON tree-3: ((S,90 (NP,6 (NP,2 (N,2:1 'JUAN') ) (CONJ,1:2 'Y') (NP,2 (N,3:3 'MARÍA') ) ) (VP,1 (V,12:4 'SE CAYERON') ) ) ) **************************************************************************** ***The refined grammar and lexicon produced the user corrected translation*** The correct translation is: JUAN Y MARÍA SE CAYERON **************************************************************************** **************************************************************************** ***However it is still producing the incorrect translation, previously corrected*** by the user: JUAN Y MARÍA CAYERON **************************************************************************** ********************************************************* Refinement was successfull, but lexical ambiguity increased ************************************************************* Affected Rule are: 0 Before action: SLWords: John and Mary fell TempCTLWords (1st time = TLWords): juan y marÍa se cayeron Alignments: ((1,1),(2,2),(3,3),(4,5)) ************************************************** After all actions in CI: SLWords: John and Mary fell TempCTLWords (1st time = TLWords): juan y marÍa se cayeron Alignments: ((1,1),(2,2),(3,3),(4,5),(4,4)) ************************************************** MAIN::CI's CTL sentence is instantiated with [ellos ven agua ] XferEngine::TLInLattice: Checking if TL sentence is in the Lattice... TL sentence is [ELLOS VEN AGUA] **************************************************************************** ***This translation: ELLOS VEN AGUA is being generated by the current system. ************************************************************************** MAIN::pXfer->TLInLattice: yes the CTL sentence is in the lattice MAIN::However, let's see if the RR module can make the grammar tighter, by not generating the incorrect translation (TL) moving on to refining it... **************************************************************************** ************************************************** After all actions in CI: SLWords: they see water TempCTLWords (1st time = TLWords): ellos ven agua Alignments: ((1,1),(2,2),(3,3)) ************************************************** MAIN::CI's CTL sentence is instantiated with [ME GUSTARÍA IR ] XferEngine::TLInLattice: Checking if TL sentence is in the Lattice... TL sentence is [ME GUSTARÍA IR] MAIN::pXfer->TLInLattice: no, the CTL sentence is NOT in the lattice Affected Rule are: 0 Before action: SLWords: I would like to go TempCTLWords (1st time = TLWords): ME GUSTARÍA QUE IR Alignments: ((2,1),(4,2),(5,3)) Affected Rule are: 0 Before action: SLWords: I would like to go TempCTLWords (1st time = TLWords): ME GUSTARÍA QUE IR Alignments: ((1,1),(2,1),(4,2),(5,3)) Affected Rule are: 1 {PP,2} ['QUE'] Before action: SLWords: I would like to go TempCTLWords (1st time = TLWords): ME GUSTARÍA QUE IR Alignments: ((1,1),(2,1),(4,2),(5,3)) Action type = delete Wi: [QUE] i is 3 - double checked that there is no repeated log file for "Mary and John fell" - made all Affected Rule comments more specific, so that I know which one is being printed when Order of storing log files: 0,4,2,3,5,6,7,9 (not sure why the dir traverser does it in this order, check again with a new dir) 0:[I see the red car] -- [veo el auto roja] 4:[you saw the woman] -- [viste la mujer] 2:[I see the red unicorn] -- [veo el unicorn rojo] 3:[Mary plays the guitar] -- [marÍa juega la guitarra] 5:[John and Mary fell] -- [juan y marÍa cayeron] 6:[they see water] -- [ellos ven agua] 7:[I would like to go] -- [ME GUSTARÍA QUE IR] 9:[Gaudi was a great artist] -- [gaudÍ era un artista gran] Maybe problem is caused by the fact that TLS is in CAPS... [I would like to go] -- [ME GUSTARÍA QUE IR] parenthesis: why some of the log files seem to be processed twice? ------------------------------------------------ Final Sentences: * Source Language Sentence: "you saw the woman" * Target Language Sentence: "viste a la mujer" * Alignments: * "you" to "" * "saw" to "viste" * "the" to "la" * "woman" to "mujer" ------------------------------------------------ Final Sentences: * Source Language Sentence: "you saw the woman" * Target Language Sentence: "viste a la mujer" * Alignments: * "you" to "" * "saw" to "viste" * "the" to "la" * "woman" to "mujer" ------------------------------------------------ -> maybe it treats them like new actions... need to double check: MAIN::CI's CTL sentence is instantiated with [viste a la mujer ] XferEngine::TLInLattice: Checking if TL sentence is in the Lattice... TL sentence is [VISTE A LA MUJER] MAIN::pXfer->TLInLattice: no, the CTL sentence is NOT in the lattice Starting new loop through the Actions in CI::Affected Rule are: 2 {VP,46} ['VISTE'] {NP,3} ['LA'] ... tl-5: TÚ VISTE LA MUJER tree-5: ((S,90 (NP,1 (PRON,2:1 'TÚ') ) (VP,2 (VP,1 (V,4:2 'VISTE') ) (NP,3 (DET,2:3 'LA') (N,4:4 'MUJER') ) ) ) ) *********************************************************** Refinement did not work, need to revise manually ************************************************************* DEBUG: Add case done Starting new loop through the Actions in CI::Affected Rule are: 0 Before action: SLWords: you saw the woman TempCTLWords (1st time = TLWords): viste a la mujer Alignments: ((2,1),(3,3),(4,4)) ************************************************** After all actions in CI: SLWords: you saw the woman TempCTLWords (1st time = TLWords): viste a la mujer Alignments: ((2,1),(3,3),(4,4)) ************************************************** But 5 only has one time and it also outputs it again at the end: by the user: JUAN Y MARÍA CAYERON **************************************************************************** ********************************************************* Refinement was successfull, but lexical ambiguity increased ************************************************************* Starting new loop through the Actions in CI::Affected Rule are: 0 Before action: SLWords: John and Mary fell TempCTLWords (1st time = TLWords): juan y marÍa se cayeron Alignments: ((1,1),(2,2),(3,3),(4,5)) ************************************************** After all actions in CI: SLWords: John and Mary fell TempCTLWords (1st time = TLWords): juan y marÍa se cayeron Alignments: ((1,1),(2,2),(3,3),(4,5),(4,4)) ************************************************** Looking at the log file 7 ("I would like to go" -- "ME GUSTARÍ QUE IR"), the times when it prints before and after info, is due to alignment changes -> add print statement saying that an alignment has been added, so that I know it's not a bug! - program is running until it segs fault for the last log file (9 Gaudi), since cwo is still being implemented, so it's not that bad!!! August 14, 2006 - backed up to temuco and Avenue afs from temuco (temuco:/usr4/aria/RuleRefinement) and /afs/cs.cmu.edu/project/avenue-1/Avenue/RuleRefinement: cp -uR /usr0/aria/RuleRefinement/* . - started expanding the G and the L August 21, 2006 - done expanding G and L, now refining expanded G and L... In /avenue/usr0/aria/RuleRefinement/V0.07 RuleRefinement > ! RR.out.InitialResults RR.out.InitialResults Compiled on Aug 3 2006 17:56:50 with g++ version: 3.2.2 20030222 (Red Hat Linux 3.2.2-5) in debug mode Parameters are: Debug Level = 2 Lexicon File = /usr0/aria/eng2spa/lexicons/simulation-lexicon.trf Grammar File = /usr0/aria/eng2spa/grammars/simulation-grammar.trf *************************************************************** StartXfer::initfile is /usr0/aria/eng2spa/auto-init.txt Turning on Latin-1 mode Setting normalizecase to UPPER-CASE Setting find all translations to ON Setting output source text to ON Setting showtrace to full trace with src indices Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-ID.trf with 19 rules added Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon-ID.trf with 227 rules added MAIN::Adding from directory : /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08-25-11387 ... August 30, 2006 - Get results for running RR one log file at a time RR + following log files Order of storing log files: 0,4,2,3,5,6,7,9 (not sure why the dir traverser does it in this order, check again with a new dir) 0:[I see the red car] -- [veo el auto roja] simulation-lexicon-REFINED-1.trf (same as simulation-lexicon.trf) simulation-grammar-REFINED-1.trf (NP,8, N-ADJ agreement contraint added) 4:[you saw the woman] -- [viste la mujer] simulation-lexicon-REFINED-1.trf simulation-grammar-REFINED-2.trf (NP,10: "a" added; for now added "" manually) 2:[I see the red unicorn] -- [veo el unicorn rojo] simulation-lexicon-REFINED-4.trf [+unicornio; added default features mannually] simulation-grammar-REFINED-2.trf 3:[Mary plays the guitar] -- [marÍa juega la guitarra] simulation-lexicon-REFINED-5.trf [add feature value contraints to lex entries juega + toca + guitarra] simulation-grammar-REFINED-5.trf [VP,2 -> VP,47: added VP-NP feat constraint] 5:[John and Mary fell] -- [juan y marÍa cayeron] simulation-lexicon-REFINED-6.trf [+se cayeron] simulation-grammar-REFINED-5.trf 7:[I would like to go] -- [ME GUSTARÍA QUE IR] not implemented yet 9:[Gaudi was a great artist] -- [gaudÍ era un artista gran] crashing (need to finish implementing) September 7, 2006 - met with Bill. He's fixed some bugs and will be working on the remaining tasks this weekend. - moved to version V0.08 and copied RRRule.cpp and CorrectionInstance.cpp into new dir. - since temuco was upgraded, the corss-mounting wasn't preserved and so I got a permission denied error message when trying to compile RR (accessing libraries in temuco). I also needed to update the path for antlr. - I am getting a compiler error that I was NOT getting before, so it's unrelated to Bill's upgrade... checked on V0.07: [aria@avenue V0.07]$ make RR /usr/bin/g++ -g -ggdb3 -ftemplate-depth-24 -Wno-non-virtual-dtor -DNDEBUG -o RuleRefinement.o -c RuleRefinement.cpp -I/temuco/usr5/shared/code/antlr/antlr-2.7.1/lib/cpp -I/afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/stable-linux2 -I/shared/Genkit/UKernel -I/shared/Genkit/Toolbox RuleRefinement.cpp: In function `int main(int, char**)': RuleRefinement.cpp:327: parse error before `*' token RuleRefinement.cpp:330: `pCI' undeclared (first use this function) RuleRefinement.cpp:330: (Each undeclared identifier is reported only once for each function it appears in.) make: *** [RuleRefinement.o] Error 1 fixed a punctuation error, and got some real compiler errors from Bill's code: CorrectionInstance.cpp #include "Lexicon.hpp" -> #include "CICollection/Lexicon.hpp" emailed Bill to make sure Lexicon.hpp has not changed... Debugged Bill's code: - pt --> m_tree; - added GetTree() back, not sure why you took it out, and - PlaceWordsInVector: changed the call the right function with the right parameter: StringUtils::StringToLower(word.value); but now the compiler is complaining about the new implementation of RRRule.cpp: [aria@avenue V0.08]$ make RR ... RuleRefinement.o(.text+0x320b): In function `main': /usr0/aria/RuleRefinement/V0.08/RuleRefinement.cpp:867: undefined reference to ` RRRule::RuleInstantiation(int, ParseTree*, int&, bool)' RuleRefinement.o(.text+0x576c):/usr0/aria/RuleRefinement/V0.08/RuleRefinement.cp p:1708: undefined reference to `RRRule::RuleInstantiation(int, ParseTree*, int&, bool)' RRRule.o(.gnu.linkonce.d._ZTV18LiteralConstituent+0x10): undefined reference to `LiteralConstituent::GetHashIndex()' RRRule.o(.gnu.linkonce.d._ZTV14POSConstituent+0x10): undefined reference to `POS Constituent::GetHashIndex()' RRRuleCollection.o(.text+0xd54): In function `RRRuleCollection::AddRule(RRRule*) ': : undefined reference to `RRRule::GetRHSConstituentHash()' RRRuleCollection.o(.text+0x1836): In function `RRRuleCollection::GetAllGrammarRu lesWithRHSConstituents(std::vector >&, std::vector >&)': : undefined reference to `RRRule::AreRHSConstituentsSame(std::vector >&)' collect2: ld returned 1 exit status make: *** [RR] Error 1 - emailed Bill again with list of default values for each POS and this. Friday, September 8, 2006 - met with Bill, he had modified the incorrect RRRule.cpp file, made his changes again, but now the ToLower change in CI.cpp (PlaceWordsInVector, SetTLWords, SetSLWords, SetCTLWords) has the effect of not matching the TL sentence with the output of the Xfer engine... so I reverted the change. I finally figured out what was wrong, the lexicon now contains two sets of quotes: (lexicons/simulation-lexicon-ID.trf) ;F:0 {N,1} N::N |: [""car""] -> [""auto""] ( (X1::Y1) ((x0 form) = car) ((x0 agr pers) = 3) ((x0 agr num) = sg) ((y0 agr gen) = masc) ((x0 semtype) = object) ) - emailed Bill about this. Thursday, September 14, 2006 - met with Bill: testing his code (lexicon bug fixed) it's workinf now!! "will" preserves quotes in the grammar rules, and lexical rules have just one set of quotes - method that adds the default values for a new lexical entry given a POS is implemented. Since this is language dependent, Bill doesn't want to add it to the general method that creates a new lexical entry. - move constit method is implemented, need to test - bill managed to track down a bug in the GetTree method I wrote. Since the = operator is not overloaded, when I return the tree, I make a copy by copying all the bits, and not just the values (this is really bad!!! since I can delete a node in instance 1, and even though instance 2 is the exact same tree, it won't be deleted there... Bill is going to implement this method and fix the ParseTree class, since it has memory leaks and he thinks there are many bad things about it... Monday, September 18, 2006 - ParseTree is not compiling since bill included Lexicon.hpp, but there is no separate class for the lexicon... - he didn't submit CorrectionInstance update, so GetTree is probably still my unsafe implementation - emailed him with these questions, met with him at 3pm - 7: I would like to go (me gustaria que ir) does not have a full parse, so the RR crashes. Need to think of a delete example that can be parsed in the first place [aria@avenue 2006-5-15-13-08-25-11387]$ mv 7 ../7-of-2006-5-15-13-08-25-11387 - we weren't setting m_tree to anything in LoadTctool method in void CorrectionInstance::LoadTCToolLogFile(const char *szLogFileName, ParseTree *pTree) - testing previous bug fixes: - GetTree is working now... - Add a word (constit) to a Grule: - quotes added: yes (see" simulation-grammar-REFINED-5.trf) -> change affected rules heuristic to first trying out the rule which contains more context ex: "a" -> VP "a" NP instead of "a" NP -> subject!!! - constraint indices updated? yes! {NP,3} NP::NP : [DET N] -> [DET N] ( (X1::Y1) (X2::Y2) (x0 = x2) ((y1 def) = (x1 def)) ((y2 agr) = (x2 agr)) ((y1 agr gen) = (y2 agr gen)) ((y1 agr num) = (y2 agr num)) ) {NP,10} NP::NP : [DET N] -> ["a" DET N] ( ;(P:{NP,3}) (X1::Y2) (X2::Y3) (x0 = x2) ((y2 def) = (x1 def)) ((y3 agr) = (x2 agr)) ((y2 agr gen) = (y3 agr gen)) ((y2 agr num) = (y3 agr num)) ) - Add new lexical enty (unicorn): - default values added? yes (in RRUle: void RRLexiconRule::SetConstraintsFromPOS(POS_TYPE POS)) added code to my main: pNewLexEntry = new RRLexiconRule(POS, RC.USeRuleIDManager(), vsSLside, vsCTLside); // adds default constraints for given POS, implemented separately //from Creating a new LexEntry, since it is language dependent pNewLexEntry->SetConstraintsFromPOS(POS); -> the unicorn example works very well now, it doesn't add any ambiguity, due to the lack of constraints. - Mary plays the guitar: even though new feature val is added to both clue word (guitarra) and correction word (toca), and to the grammar rule that subsumes both words, the feature did not get percolated to intermediate levels: //**** need to percolate feat up to phrase ****// WiPOSPos is: [1] and CluePOSPos is: [2] Bifurcated {VP,2} Need to create agreement constraint with feat_0 and add constraint to rule New value constraint created is (y1 feat_0) = (y2 feat_0) Added to the rule... {VP,5} VP::VP : [VP NP] -> [VP NP] ( ;(P:{VP,2}) (X1::Y1) (X2::Y2) ((x2 case) = acc) ((x0 obj) = x2) ((x0 agr) = (x1 agr)) (x0 = x1) ((y0 tense) = (x0 tense)) ((y0 agr) = (y1 agr)) ((y1 feat_0) = (y2 feat_0)) ) also need to refine VP and NP (look for the right instances of NP and VP in the tree) This is the example that is illustrated in detail in the instructions I sent Bill Tuesday, September 19, 2006 - gaudi was a great artist: is extacting commented rule!!! -> need to restrict that Wednesday, September 20, 2006 this is a problem that arises when processing more than one logfile at once. Since rules might have been modified already, but the translation tree is stored at the beginning when all the logfiles get stored... Think about changing the logic of the program so that it actually processes one file and then makes a correction, Then stores the next file (will get the right trace, translation tree, with the new, refined rules, and not the old ones) Alternatively, the pAct->GetAffectedRuleAndLexID(j, ID, lex); Can doublecheck if that ID is not active, and if so, look in the rule hierarchy to see which one is, and extract that one instead. Need to ask bill to implement a new method that will pick the grammar rule with the most context (most specific) to avoid over-generalization. Before sending him another email with this, test move constit and everything else. - move constit in RHS (oldpos, newpos) is implemented, and tested in RRRule: void MoveConstituentInRHS(int iOldPos, int iNewPos); Both constraint indices are correctly updated :-) - move to next version and change the logic in which I store and process logfiles *************** **** V0.09 **** *************** backed up RR dir to temuco and Avenue-afs (needed to remove old directories, since otherwise it exceeded disk quota!) So from now on, when doing backup s, i'll need to only copy over modified directories, otherwise, all the old dirs will also get backedup on Avenue-afs **************************************************************************** - Actually, I can't process files as I go, since I want to be able to compare them, eliminate duplicates and rank them so that I process logiles that are simpler first. -> need to use rule hierarchy in order to retrieve only active rules - emailed bill: Code fixes (affected rules need to be active + pick most specific rule) *** delete case: ... ME GUSTARA QUE IR tree: <((S,0 (VP,1 (VB,1 (V,10:2 'ME GUSTARA') ) ) ) )> <(PREP,1:4 'QUE')> <(V,11:5 'IR')> !!!Attention: Tree is empty!!!! MAIN::Printing Tree extracted from Xfer engine for SL-TL pair in the logfile: NULL subroot -> seg fault Need to think of an example that will parse (the unicorn example parses since the Xfer engine robuts future is on and can skip OOV nouns and verbs. Thursday, September 21, 2006 - left for Colombia Thursday, September 28, 2006 - met with bill to discuss percolate method, he understands it now. he implemented a method to access the active rule, as opposed to the old affected rule, which might have changed. - fixed code so that it doesn't seg fault any more!!! the while loop over the logfiles kept going after there were no more log files to process -> changed with C->NumCIs and it's working now :) - added Xfer code at the end of CWO case: Action type = cwo Wi: [gran] i: 5 i': 4 Wmoved: [artista] i-i'= 1 Affected Rule are: 1 {NP,8} [gran] CWO: Making sure it's an active rule in the R Hierarchy got active rule {NP,9} NP::NP : [DET ADJ N] -> [DET N ADJ] ( ;(P:{NP,8}) (X1::Y1) (X2::Y3) (X3::Y2) ((x0 det) = x1) ((x0 mod) = x2) (x0 = x3) (y0 = x0) (y1 == (y0 det)) (y3 == (y0 mod)) (y2 = y0) ((y1 agr num) = (y2 agr num)) ((y1 agr gen) = (y2 agr gen)) ((y3 agr gen) = (y2 agr gen)) ) Bifrucated {NP,8} New rule is : {NP,11} NP::NP : [DET ADJ N] -> [DET N ADJ] ( ;(P:{NP,9}) (X1::Y1) (X2::Y3) (X3::Y2) ((x0 det) = x1) ((x0 mod) = x2) (x0 = x3) (y0 = x0) (y1 == (y0 det)) (y3 == (y0 mod)) (y2 = y0) ((y1 agr num) = (y2 agr num)) ((y1 agr gen) = (y2 agr gen)) ((y3 agr gen) = (y2 agr gen)) ) iPOSPos (MoveFromPos) is: 3 MoveToPos is: 2 Constituent in 3 has been moved to 2 in the refined rule: {NP,11} NP::NP : [DET ADJ N] -> [DET ADJ N] ( ;(P:{NP,9}) (X1::Y1) (X2::Y2) (X3::Y3) ((x0 det) = x1) ((x0 mod) = x2) (x0 = x3) (y0 = x0) (y1 == (y0 det)) (y2 == (y0 mod)) (y3 = y0) ((y1 agr num) = (y3 agr num)) ((y1 agr gen) = (y3 agr gen)) ((y2 agr gen) = (y3 agr gen)) ) Deleting all loaded rules. Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon-REFINED-8.trf with 233 rules added Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED-8.trf with 19 rules added Checking refined lattice for the presence of CTL and TL sentences XferEngine::CheckRefinedLattice: Checking if CTL and TL sentences are in the Refined Lattice... TL sentence is [GAUD ERA UN ARTISTA GRAN] CTL sentence is [GAUD ERA UN GRAN ARTISTA] these are the alterative translations for Gaudi was a great artist : tl-0: GAUD ERA UN ARTISTA GRANDE tree-0: ((S,1 (NP,2 (N,7:1 'GAUD') ) (VP,5 (VP,1 (VB,2 (AUX,1:2 'ERA') ) ) (NP,9 (DET,3:3 'UN') (N,8:5 'ARTISTA') (ADJ,3:4 'GRANDE') ) ) ) ) tl-1: GAUD ERA UNA ARTISTA GRANDE tree-1: ((S,1 (NP,2 (N,7:1 'GAUD') ) (VP,5 (VP,1 (VB,2 (AUX,1:2 'ERA') ) ) (NP,9 (DET,31:3 'UNA') (N,8:5 'ARTISTA') (ADJ,3:4 'GRANDE') ) ) ) ) tl-2: GAUD ERA UN ARTISTA GRAN tree-2: ((S,1 (NP,2 (N,7:1 'GAUD') ) (VP,5 (VP,1 (VB,2 (AUX,1:2 'ERA') ) ) (NP,9 (DET,3:3 'UN') (N,8:5 'ARTISTA') (ADJ,4:4 'GRAN') ) ) ) ) tl-3: GAUD ERA UNA ARTISTA GRAN tree-3: ((S,1 (NP,2 (N,7:1 'GAUD') ) (VP,5 (VP,1 (VB,2 (AUX,1:2 'ERA') ) ) (NP,9 (DET,31:3 'UNA') (N,8:5 'ARTISTA') (ADJ,4:4 'GRAN') ) ) ) ) tl-4: GAUD ESTABA UN ARTISTA GRANDE tree-4: ((S,1 (NP,2 (N,7:1 'GAUD') ) (VP,5 (VP,1 (VB,2 (AUX,2:2 'ESTABA') ) ) (NP,9 (DET,3:3 'UN') (N,8:5 'ARTISTA') (ADJ,3:4 'GRANDE') ) ) ) ) tl-5: GAUD ESTABA UNA ARTISTA GRANDE tree-5: ((S,1 (NP,2 (N,7:1 'GAUD') ) (VP,5 (VP,1 (VB,2 (AUX,2:2 'ESTABA') ) ) (NP,9 (DET,31:3 'UNA') (N,8:5 'ARTISTA') (ADJ,3:4 'GRANDE') ) ) ) ) tl-6: GAUD ESTABA UN ARTISTA GRAN tree-6: ((S,1 (NP,2 (N,7:1 'GAUD') ) (VP,5 (VP,1 (VB,2 (AUX,2:2 'ESTABA') ) ) (NP,9 (DET,3:3 'UN') (N,8:5 'ARTISTA') (ADJ,4:4 'GRAN') ) ) ) ) tl-7: GAUD ESTABA UNA ARTISTA GRAN tree-7: ((S,1 (NP,2 (N,7:1 'GAUD') ) (VP,5 (VP,1 (VB,2 (AUX,2:2 'ESTABA') ) ) (NP,9 (DET,31:3 'UNA') (N,8:5 'ARTISTA') (ADJ,4:4 'GRAN') ) ) ) ) *********************************************************** Refinement did not work, need to revise manually ************************************************************* for some reason NP,9 is the only one firing, NP,11 never does... -> need to look into it I think I need to add the new rule to the grammar!! Friday, September 29, 2006 - added new rule to RuleCollection (RC), run again last example (Gaudi era un gran artista) is working!!! :-)))) - worked with bill to debug and test percolate. It finally seems to be working, however the wrong translation is still being produced by the xfer engine... :( looking into it... all the rules in the translation tree seem to be correcly labelled with feat_0 and the lexical entries also, so I have no idea why this is not ruling out tl-2: MARA JUEGA LA GUITARRA tree-2: ((S,1 (NP,2 (N,3:1 'MARA') ) -> (VP,5 (VP,6 (VB,5 (V,8:2 'JUEGA') ) ) (NP,11 (DET,2:3 'LA') (N,5:4 'GUITARRA') ) ) ) ) V,8 has: ((y0 feat_0) = -) VB,5 has: ((y0 feat_0) = (y1 feat_0)) VP,6 has: ((y0 feat_0) = (y1 feat_0)) N,5 has: ((y0 feat_0) = +) NP,11 has: ((y0 feat_0) = (y2 feat_0)) and VP,5 has ((y1 feat_0) = (y2 feat_0)) weird... look at it when I am less tired -> isolate rules and run Xfer engine (outside RR) on just that sentence will do this after METIS paper deadline Saturday, September 30, 2006 Evaluation on Diagnostic Test: run both initial and final grammar and saw that since I am not constraining the pre-nominal NP rule, ambiguity is increased by a lot -> need to extract SL-TL lexical entry, add value constraint and add value contraint to grammar rule do this for NAACL paper, for now, just get preliminary results ************** Oct. 5-7: GHC ************** Monday, October 9, 2006 - rerun RR, but now I get a segfault, and I have no idea why: ... tree-2: ((S,1 (NP,1 (PRON,6:1 'ELLAS') ) (VP,2 (VP,1 (VB,1 (V,3:2 'VE') ) ) (NP,2 (N,6:3 'AGUA') ) ) ) ) No alternative matches the TL sentence: ELLOS VEN AGUA !!!Attention: Tree is empty!!!! MAIN::Printing Tree extracted from Xfer engine for SL-TL pair in the logfile: NULL subroot I thought it was because I moved the Gaudi log file back into the dir, but did not change its name to just 9: [aria@avenue 2006-5-15-13-08-25-11387]$ mv ../9-of-2006-5-15-13-08-25-11387 . [aria@avenue 2006-5-15-13-08-25-11387]$ ls 0 2 3 4 5 6 9-of-2006-5-15-13-08-25-11387 [aria@avenue 2006-5-15-13-08-25-11387]$ mv 9-of-2006-5-15-13-08-25-11387 9 but that didn't fix it, only when I moved 6 out of the dir, did it work: [aria@avenue 2006-5-15-13-08-25-11387]$ mv 6 ../6-of-2006-5-15-13-08-25-11387 [aria@avenue 2006-5-15-13-08-25-11387]$ ls 0 2 3 4 5 9 weird... - Added "would like to go" log file (7) back to the dir the RR traverses: [aria@avenue 2006-5-15-13-08-25-11387]$ mv ../7-of-2006-5-15-13-08-25-11387 . [aria@avenue 2006-5-15-13-08-25-11387]$ ls 0 2 3 4 5 7-of-2006-5-15-13-08-25-11387 9 [aria@avenue 2006-5-15-13-08-25-11387]$ mv 7-of-2006-5-15-13-08-25-11387 7 Added rule so that the delete example parses for now, later think of a good example whose correction will generalize Tuesday, October 10, 2006 - debugging delete case: // 1 ///////////////////////////////////////////////////////////////////////////// // extract SLWordPos and Word from alignment info // extract (multiple) alignment(s) from TLword to SLWords for (int p = 0; p < TLWords[TCTOOLPOS_TO_VECTPOS(TLWordPos)].alignments.size(); p++) { SLWordPos = TLWords[TCTOOLPOS_TO_VECTPOS(TLWordPos)].alignments[p]; // need to debug! This gives me a position higher than the one that should be giving me... // 5 - go, instead of 4 - to SLWordPos--; SLWord = SLWords[TCTOOLPOS_TO_VECTPOS(SLWordPos)].value; vsSLside.push_back(SLWord); cout << "SLWordPos is " << SLWordPos << " and SLWord is " << SLWord << endl; RR.out.10-10-06: ... Action type = delete Wi: [que] i is 3 SLWordPos is 4 and SLWord is to --- vector vsEmptySLside; vsEmptyTLside.push_back(""); pNewLexEntry->SetTLLexicon(vsEmptyTLside); Got Lexical entry for "to" and "que" {CONJ,2} CONJ::CONJ |: ["to"] -> ["que"] ( (X1::Y1) ((x0 form) = to) ) Since pLexEntry exists in the Lexicon... {CONJ,3} CONJ::CONJ |: ["to"] -> [""] ( ;(P:{CONJ,2}) (X1::Y1) ((x0 form) = to) ) Simplest DELETE case implemented: **************************************************************************** ***The refined grammar and lexicon produced the user corrected translation*** The correct translation is: ME GUSTARA IR **************************************************************************** **************************************************************************** ***However it is still producing the incorrect translation, previously corrected*** by the user: ME GUSTARA QUE IR **************************************************************************** ****************************** ****************************** Moving to next version: v0.10 ****************************** ****************************** so that all the changes to reduce ambiguity don't interferre with the evals being done with V0.09 for NAACL 06 - backed up the relevant files to afs and temuco temuco: cd /usr4/aria/RuleRefinement [aria@temuco RuleRefinement]$ cp /avenue/usr0/aria/RuleRefinement/info/ChangeLog.txt info cp: overwrite `info/ChangeLog.txt'? y [aria@temuco RuleRefinement]$ cp -Ru /avenue/usr0/aria/RuleRefinement/V0.09/* V0.09 - for some reason the RR is now processing 9 before all the other files... and this is problematic, since the refinement necessary to correct 9 adds a lot of ambiguity... If problem persists, need to move 9 out and process the rest 1st, then process 9 on the refined grammars. - Doing step-wise eval: See /usr0/aria/eng2spa/corpus/DiagnosticTests/00-Eval-StepWise4NAACL-10-10-06 Thursday, October 12, 2006 - Adding constraints to the New Rule with the MoveConstit Doing this on V0.10... Action type = cwo Wi: [gran] i: 5 i': 4 Wmoved: [artista] Word has been moved this many postitions (i-i')= 1 SLWordPos is 4 and SLWord is great Got Lexical entry for "great" and "gran" {ADJ,4} ADJ::ADJ |: ["great"] -> ["gran"] ( (X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ) Refining lexical entry extracted Postulating New Feature: feat_1 New value constraint created is (y0 feat_1) = + Added to refined lexical entry... {ADJ,49} ADJ::ADJ |: ["great"] -> ["gran"] ( ;(P:{ADJ,4}) (X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 feat_1) = +) ) Affected Rule are: 1 {NP,8} [gran] CWO: Making sure it's an active rule in the R Hierarchy got active rule {NP,9} NP::NP : [DET ADJ N] -> [DET N ADJ] ( ;(P:{NP,8}) (X1::Y1) (X2::Y3) (X3::Y2) ((x0 det) = x1) ((x0 mod) = x2) (x0 = x3) (y0 = x0) (y1 == (y0 det)) (y3 == (y0 mod)) (y2 = y0) ((y1 agr) = (x1 agr)) ((y1 agr num) = (y2 agr num)) ((y1 agr gen) = (y2 agr gen)) ((y3 agr gen) = (y2 agr gen)) ) Bifrucated {NP,8}iPOSPos (MoveFromPos) is: 3 MoveToPos is: 2 Constituent in 3 has been moved to 2 in the refined rule: {NP,12} NP::NP : [DET ADJ N] -> [DET ADJ N] ( ;(P:{NP,9}) (X1::Y1) (X2::Y2) (X3::Y3) ((x0 det) = x1) ((x0 mod) = x2) (x0 = x3) (y0 = x0) (y1 == (y0 det)) (y2 == (y0 mod)) (y3 = y0) ((y1 agr) = (x1 agr)) ((y1 agr num) = (y3 agr num)) ((y1 agr gen) = (y3 agr gen)) ((y2 agr gen) = (y3 agr gen)) ) Adding value constraint (=c) to the bifurcated rule... Bifurcated {NP,8} Need to create Value Constraint with feat_1 and add constraint to rule New value constraint created is (y2 feat_1) =c + Added to the rule... {NP,13} NP::NP : [DET ADJ N] -> [DET ADJ N] ( ;(P:{NP,12}) (X1::Y1) (X2::Y2) (X3::Y3) ((x0 det) = x1) ((x0 mod) = x2) (x0 = x3) (y0 = x0) (y1 == (y0 det)) (y2 == (y0 mod)) (y3 = y0) ((y1 agr) = (x1 agr)) ((y1 agr num) = (y3 agr num)) ((y1 agr gen) = (y3 agr gen)) ((y2 agr gen) = (y3 agr gen)) ((y2 feat_1) =c +) ) Added new rule to grammar... Adding a Blocking constraint (=-) to the original rule... Bifurcated {NP,8} Creating a blocking constraint with feat_1 and adding constraint to rule New value constraint created is (y3 feat_1) = - Added to the rule... {NP,14} NP::NP : [DET ADJ N] -> [DET N ADJ] ( ;(P:{NP,9}) (X1::Y1) (X2::Y3) (X3::Y2) ((x0 det) = x1) ((x0 mod) = x2) (x0 = x3) (y0 = x0) (y1 == (y0 det)) (y3 == (y0 mod)) (y2 = y0) ((y1 agr) = (x1 agr)) ((y1 agr num) = (y2 agr num)) ((y1 agr gen) = (y2 agr gen)) ((y3 agr gen) = (y2 agr gen)) ((y3 feat_1) = -) ) Added new rule to grammar... Disabling original rule Deleting all loaded rules. Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon-REFINED-8.trf with 250 rules added Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED-8.trf with 21 rules added Checking refined lattice for the presence of CTL and TL sentences XferEngine::CheckRefinedLattice: Checking if CTL and TL sentences are in the Refined Lattice... TL sentence is [GAUD ERA UN ARTISTA GRAN] CTL sentence is [GAUD ERA UN GRAN ARTISTA] these are the alterative translations for Gaudi was a great artist : tl-0: GAUD ERA UN GRAN ARTISTA tree-0: ((S,1 (NP,2 (N,7:1 'GAUD') ) (VP,6 (VP,7 (VB,2 (AUX,1:2 'ERA') ) ) (NP,13 (DET,3:3 'UN') (ADJ,49:4 'GRAN') (N,8:5 'ARTISTA') ) ) ) ) tl-1: GAUD ERA UNA GRAN ARTISTA tree-1: ((S,1 (NP,2 (N,7:1 'GAUD') ) (VP,6 (VP,7 (VB,2 (AUX,1:2 'ERA') ) ) (NP,13 (DET,31:3 'UNA') (ADJ,49:4 'GRAN') (N,8:5 'ARTISTA') ) ) ) ) tl-2: GAUD ERA UN ARTISTA GRANDE tree-2: ((S,1 (NP,2 (N,7:1 'GAUD') ) (VP,6 (VP,7 (VB,2 (AUX,1:2 'ERA') ) ) (NP,14 (DET,3:3 'UN') (N,8:5 'ARTISTA') (ADJ,3:4 'GRANDE') ) ) ) ) tl-3: GAUD ERA UNA ARTISTA GRANDE tree-3: ((S,1 (NP,2 (N,7:1 'GAUD') ) (VP,6 (VP,7 (VB,2 (AUX,1:2 'ERA') ) ) (NP,14 (DET,31:3 'UNA') (N,8:5 'ARTISTA') (ADJ,3:4 'GRANDE') ) ) ) ) tl-4: GAUD ESTABA UN GRAN ARTISTA tree-4: ((S,1 (NP,2 (N,7:1 'GAUD') ) (VP,6 (VP,7 (VB,2 (AUX,2:2 'ESTABA') ) ) (NP,13 (DET,3:3 'UN') (ADJ,49:4 'GRAN') (N,8:5 'ARTISTA') ) ) ) ) tl-5: GAUD ESTABA UNA GRAN ARTISTA tree-5: ((S,1 (NP,2 (N,7:1 'GAUD') ) (VP,6 (VP,7 (VB,2 (AUX,2:2 'ESTABA') ) ) (NP,13 (DET,31:3 'UNA') (ADJ,49:4 'GRAN') (N,8:5 'ARTISTA') ) ) ) ) tl-6: GAUD ESTABA UN ARTISTA GRANDE tree-6: ((S,1 (NP,2 (N,7:1 'GAUD') ) (VP,6 (VP,7 (VB,2 (AUX,2:2 'ESTABA') ) ) (NP,14 (DET,3:3 'UN') (N,8:5 'ARTISTA') (ADJ,3:4 'GRANDE') ) ) ) ) tl-7: GAUD ESTABA UNA ARTISTA GRANDE tree-7: ((S,1 (NP,2 (N,7:1 'GAUD') ) (VP,6 (VP,7 (VB,2 (AUX,2:2 'ESTABA') ) ) (NP,14 (DET,31:3 'UNA') (N,8:5 'ARTISTA') (ADJ,3:4 'GRANDE') ) ) ) ) **************************************************************************** ***The refined grammar and lexicon produced the user corrected translation*** The correct translation is: GAUD ERA UN GRAN ARTISTA **************************************************************************** **************************************************************************** ***And what is more, the refined MT system did NOT produce the incorrect translation*** detected by the user previosuly: GAUD ERA UN ARTISTA GRAN **************************************************************************** done ;-) - running Xfer with new G for 9 and 7... Monday, October 16, 2006 - trying to get the RR to refine a slightly modified verion of grammar3.trf and lexicon3.trf (init-test.trf) updated links to L and G in RuleRefinement.cpp [aria@avenue V0.10]$ ./RuleRefinement > ! RR.out.10-16-06 - first size and memory were growing to more than 550M (and aborted after 3 min), so changed the lexicon so that all the entries had "" as in the simulation-lexicon. Even changed the L and G file names, to make sure that wasn't the problem. That took care of the size problem, but it's still running after 30 min: PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 9590 aria 16 0 1340 1340 1236 R 99.7 0.0 25:00 0 RuleRefinemen 9590 aria 17 0 1340 1340 1236 R 99.9 0.0 29:16 0 RuleRefinemen 9590 aria 17 0 1340 1340 1236 R 99.9 0.0 30:02 1 RuleRefinemen 9590 aria 19 0 1340 1340 1236 R 99.7 0.0 32:42 1 RuleRefinemen killed it... - changed G and L file names, just in case: PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND ***Aborted just after 3 min and more thatn 500M*** 9879 aria 19 0 436M 436M 1300 R 99.9 21.5 2:47 1 RuleRefinemen 9879 aria 14 0 327M 327M 1300 R 99.9 16.2 2:07 0 RuleRefinemen 9879 aria 14 0 301M 301M 1300 R 99.9 14.9 1:57 0 RuleRefinemen 9879 aria 19 0 225M 225M 1300 R 99.7 11.1 1:27 0 RuleRefinemen 9879 aria 14 0 173M 173M 1300 R 99.9 8.5 1:06 0 RuleRefinemen 9879 aria 19 0 83408 81M 1300 R 99.9 4.0 0:31 0 RuleRefinemen RR.out.10-16-06: Compiled on Oct 16 2006 16:10:10 with g++ version: 3.2.2 20030222 (Red Hat Linux 3.2.2-5) in debug mode Parameters are: Debug Level = 2 Lexicon File = /usr0/aria/eng2spa/lexicons/lexicon-TestEC.trf Grammar File = /usr0/aria/eng2spa/grammars/grammar-TestEC.trf Looking at an old RR.out, realized I also need to update the the auto-init file used by the Xfer class... Compiled on Oct 12 2006 14:19:05 with g++ version: 3.2.2 20030222 (Red Hat Linux 3.2.2-5) in debug mode Parameters are: Debug Level = 2 Lexicon File = /usr0/aria/eng2spa/lexicons/simulation-lexicon.trf Grammar File = /usr0/aria/eng2spa/grammars/simulation-grammar.trf *************************************************************** StartXfer::initfile is /usr0/aria/eng2spa/auto-init.txt Turning on Latin-1 mode Setting normalizecase to UPPER-CASE Setting find all translations to ON Setting output source text to ON Setting showtrace to full trace with src indices Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-ID.trf with 19 rules added Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon-ID.trf with 247 rules added MAIN::Adding from directory : /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08- 25-11387 File : /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08-25-11387/0 But auto-init.txt contains to hard link to any G or L file, so there is no need to update that. These files load just fine to the Xfer engine when doing it on the command line, so it has to be a problem with the RR code... - deleted as many comments as possible from grammar and lexicon, just to make sure - Aborted after 3:16 minutes with 511M commented out "'s" and "can't", just in case -> same behaviour Xfer trace indicated that there are 458 lexical entries in /usr0/aria/eng2spa/lexicons/lexicon-TestEC.trf with 458 and 41 rules in /usr0/aria/eng2spa/grammars/grammar-TestEC.trf however when i grep for rule identifier symbols, I get several more: grep "|:" | wc-l -> 468 "\->" -> 470 "x0 from" -> 468 so there must be 10 lex entries commented out, I could only find 9, but I am sure I missed one... Tuesday, October 17, 2006 - Bill stopped by and figured out that the problem is that his readin rule function expects either the end of file or another rule, and since there were some blank lines at the end of the new G and L, it would not exit the loop, and so it would run out of memory and abort. Now it's complaining about something having an empty tree, good sign! -> bug in the lexicon 2 - had to update path for -ID and -REFINED both in RuleRefinement.cpp and XferEngine.cpp Monday, October 23, 2006 - Debugging G and L for Diagnostic Test set examples: there was a bug in the grammar (VP,5), where instead of tense = inf, I had type = inf, which was conflicting with type = refl for marcharte and convertirme. Erik pointed it out to me. Fixed grammar. -re-run RR so that refined grammar reflects change, but I have the expanded G and L loaded in in order to get results for the EC test set, so... cp RuleRefinement RuleRefinement-TestEC cp RuleRefinement.cpp RuleRefinement-TestEC.cpp editing the RuleRefinement.cpp file so that it has the Diagnostic Test set G and Ls Since Erik updated the Xfer engine, when I recompile it, it gives me tones of error messages, emailed Erik Updated Makefile -> Using local copy for now :) Friday, November 3, 2006 - met with Bill: most specific rule + METIS workshop Picking the most specific rule is implemented now (Parse Tree contains range info, so that it knows where it can add a word and were it can't, look for the highest node where the word can still be inserted between daugther constits: limitations: when two words are added next to each other, the current implementation won't work (since parse tree doesn't get updated after each correction right now) When a word is added at the beginning or at the end of a sentence, it get's added to the mother node. - send reply to METIS person (Final paper deadline: Dec. 1) Trying to see if Bill can also come. Tuesday, November 13, 2006 ***************************************************************************** V0.10 is working for 7 refinements and VP is now being refined with "a" so that spurious subjects and obliques are not being generated. Haven't done any formal evaluation with this final implementation in place. ***************************************************************************** - moved to V0.11, since I want to add new refinements for subj-verb agreement Wednesday, November 15, 2006 - tried running the RR on a different dir (dirs.txt) /usr0/aria/RuleRefinement/IOFiles/2006-3-31-17-35-31-23763 [aria@avenue V0.11]$ ./RuleRefinement >! RR.out.11-15-06 Segmentation fault r[aria@avenue 2006-3-31-17-35-31-23763]$ rm user-info all [aria@avenue 2006-3-31-17-35-31-23763]$ rm 6 7: TL sentence is [WOULD LIKE QUE IR] these are the alterative translations and their parses for I would like to go: tl-0: ME GUSTARA QUE IR for some reason the Xfer doesn't output "me gustaria que ir"! Even though lexicon-test.trf has both "ir" and "me gustaria" [aria@avenue 2006-3-31-17-35-31-23763]$ mv 7 ../7-of-2006-3-31-17-35-31-23763 Now, when processing 8: ... TL is one of the alternatives: YO VISTE T MAIN::Printing Tree extracted from Xfer engine for SL-TL pair in the logfile: ((S,1 (NP,1 (PRON,1:1 'YO')) (VP,3 (VB,1 (V,8:2 'VISTE')) (NP,1 (PRON,4:3 'T')))) MAIN::Instantiating correction instance from TCTool Log File Segmentation fault RuleRefinement.cpp: /// Instantiating CI from TCTool Log File if (DebugLevel >= 1) cout << "\n\nMAIN::Instantiating correction instance from TCTool Log File\n"; // When loading single file // pCI->LoadTCToolLogFile(pLogFile, &tree); pCI->LoadTCToolLogFile(file.c_str(), &tree); cout << "MAIN::CI instantiated\nCTLS is: " << pCI->GetCTLSentence() << endl << endl; leaving aside for now, since I want to focus in getting new refinemenets on: -> look at current sentences see if any has subj-verb agreement problems and try using that first to test and debug the code. -> run on more Log Files (add as higher numbers at the end) subj-verb agreement (might involve percolate, check) Adj-N number agreement Det-Adj agreement (gender) Subj Compl. agreement (cop verb) - gender and number I gave the boy a book (will need to refine V NP "a" NP rule) transfer -if init.txt init.txt: loadrules /usr0/aria/eng2spa/lexicons/simulation-lexicon.trf loadrules /usr0/aria/eng2spa/grammars/simulation-grammar.trf transfile /usr0/aria/eng2spa/corpus/more-examples-tct ; Subj-v agreement (both gender and number) I sleep ; a in front of VP -> V NP "a" NP (!= a, != refinement, indirect object) I gave the boy a book ; adj-n number agreement I meet some tall girls ; Det-Adj agreement (gender), for cases when the noun is underspecified I love a secret agent ; Subj-Compl agreemnt (gender and number) - copulative verbs the girl is tall the boys are tall ; a in front of VP -> V NP "a" NP (!= a, != refinement, indirect object) I gave the boy a book ../bin/postprocess-xfer.out.debug.pl < more-examples-tct.out.debug > input-tct-more /usr1/depot/apache/httpd/htdocs/aria/spanish: [aria@avenue spanish]$ mv input-tct input-tct-testing [aria@avenue spanish]$ mv input-tct-more input-tct Corrected the sentences with the TCTool (and took snapshots -> saved in HLT 07 folder) cp -r out-test/2006-11-15-17-32-40-6618 /usr0/aria/RuleRefinement/IOFiles/ edited dirs.txt and runned it... [aria@avenue V0.11]$ ./RuleRefinement >! RR.out.MoreExamples.11-15-06 No parse found. LEX 0 GRA 0 UNK 0 MORPH 0 COMP 0 Segmentation fault -> need to do it with the basic G and L first, change RR.cpp again -> look at how to end the loop (of processing Log Files) sooner. Right now there are 4 "before action" print outs after the last log file (7) is processed November 19, 2006 - realized V0.10 is empty!!! must have moved to 11 instead of copying it over :( - cp files in V0.11 back to V0.10 and changed dirs... [aria@avenue V0.10]$ mv dirs-working.txt dirs.txt seems to be working, phew! backed it up to temuco and Avenue!!! NEED TO TEST (involves TCTool work to generate new log files) - test dir traverse, is it robust? !! - clue word info stored properly? ("se" -> cayeron) [need to generate log file] also, have the gaudi sentence with two errors, so that I can test sentences with more than just one error :-) Need to: -> change affected rules heuristic to first trying out the rule which contains more context ex: "a" -> VP "a" NP instead of "a" NP -> subject!!! TO DO-------------------- -> add print statement saying that an alignment has been added, so that I know it's not a bug! -> need to make sure that at the end of the lexical refinements for add and edit, I also output a "done comment" -> need to detect when user did NOT refine anything, and not store that log file in the correction, right? Look at the instructions I had given to Bill about CICollection MAIN::CI's CTL sentence is instantiated with [ellos ven agua ] XferEngine::TLInLattice: Checking if TL sentence is in the Lattice... TL sentence is [ELLOS VEN AGUA] **************************************************************************** ***This translation: ELLOS VEN AGUA is being generated by the current system. ************************************************************************** MAIN::pXfer->TLInLattice: yes the CTL sentence is in the lattice MAIN::However, let's see if the RR module can make the grammar tighter, by not generating the incorrect translation (TL) moving on to refining it... **************************************************************************** ************************************************** After all actions in CI: SLWords: they see water TempCTLWords (1st time = TLWords): ellos ven agua Alignments: ((1,1),(2,2),(3,3)) ************************************************** -> try with different TCTool directory 2006-3-31-17-35-31-23763 see if it also crashes after the 4th refinement -> I will need to test for all possible order combinations to find all the bugs... !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Once it's working, look at ways of modularizing it more. Should not have to load the grammar and lexicon and check the Xfer lattice for each switch case, but just once after all of them. l. 1236 //////////////////////////////////////////// // Printing the lexicon to a file (even though it might not have been changed) // Add a flag so that if the lexicon has not been refined, the old file is used instead // this would work as a natural bookkeeping, but could also get confusing, knowing which // log files caused refinements and which didn't... need to implement a better bookkeeping // strategy ///////////////////////////////// -> proceed working from here... - implement precision in lattice score testing cwo case (sentences 9, 8, although too complicated for now): ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-3-31-17-35-31-23763/9 but CI instantiation is seg faulting... :( - finish debugging and implementing: RuleInstantiation method for 1. add a constit (literal, and then POS) (sentence 5) (when adding a word to a GraRule, the POS of the following word should be skipped (fLookatLeafPOS=false), since the method needs to retrieve the parent node of the next word (Leaf).) - implement delete case check if I can replace the SLside (add "to" to "would like" for example, for the delete case) - percolate l. 1302 - still need to test: replace featname in a lexical entry ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08-25-11387/0 >! 0RR.out - for now just test it with a made up example - bool LexicalEntry::ReplaceFeatName(sFeatName1, sFeatName2); l. 1250 - test CI Collection - think about ReverseRefinements --------------------------------------------------------- need to debug and finish perl script to massage MM lexicon - Word2Lexicon l.298 weird character problem (emailed Erik in May, remind him about this) -------------------------------------------------------------- - move pretinent methods to Refiner/Utils and RRRule delta function constraint addition - get lines of code (esitmate) to have an idea (Bill's classes + my code) -> add Bill's classes to Makefile, once everything is working, but back up working Makefile first! - ask bill about the tr pairs annotation, and rule origin (CI/user) annotation I need to look into it, and then figure out what exaclty needs to be implemented, and will email Bill - Reverse Refinement(s) I haven’t had time to look into this, but it would be great if we could look at the time stamp management before you leave, so that if rule does not result into an improvement on a test set (T2), there is a good way to reverse to the previous version of the grammar (T1). As I told you before (and maybe it’s already implemented), it would be useful to have a variable that expresses whether a rule lead to improvement or not (bool Rule.ImprovedAccuracy()). Maybe this is too complex to do before you leave, but maybe you have already implemented most of what would be needed, and it would be fairly simple. In any case, I’d like to know. pending: 8 ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-2-13-17-13-55-8336/8 >! 8RR.out Complex: 2 actions: edit + cwo te is created as a copy of tu -> feat_0 is postulated (should really be case, but RR cannot know that) -> but feat_0 doesn't get added to the lexical entries - Figure out why the program seg faults after it finishes... ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08-25-11387/1 >! 11387-1RR.out New agreement constraint created is (y1 agr pers) = (y2 agr pers) Added to the rule... {S,91} S::S : [NP VP] -> [NP VP] ( ;(P:{S,1}) (X1::Y1) (X2::Y2) (x0 = x2) ((y1 case) = nom) ((y1 agr) = (x1 agr)) ((y2 tense) = (x2 tense)) ((y1 agr pers) = (y2 agr pers)) ) **************************************************************************** ***The refined grammar and lexicon produced the user corrected translation*** The correct translation is: ELLA LEYÓ **************************************************************************** **************************************************************************** ***However it is still producing the incorrect translation, previously corrected*** by the user: ELLA LEÍ **************************************************************************** it seg faults!!! -> debug -> need to figure out why the constraint does not prevent "ella lei" from generating Bill to do: ***************************************************************************** Bill will be working on Spurious Loop and Error complexity and finish at least one of the two tasks a couple of weeks from now. - finish detect spurious loop - look at Error Complexity implementation -> paper - fix any remaining bugs in Add Constituent to RHS, CICollection, etc. - add "" to literal constits - update indices in constraints for AddConstitToRHS - getPOS should actually ouput a POS and not a position... - percolate method (new) - enhance delta function: if there is no other difference (no different value for the same attribute name), but there is a differing attribute, output that. Ultimately: error complexity score implementation (polynomial sort, reverse lexicographic (decendent) order, see paper), for now, since I'll just deal with a couple of examples, have independent errors rank higher, and dependent errors lower. ***************************************************************************** 4Bill: keep track of refinement status: proposed, confirmed1 (by exact match), confirmed2 (by increasing automatic MT metrics over a regression test) When trying to run example sentences 8 and 9 I get a seg fault: MAIN::Instantiating correction instance from TCTool Log File Segmentation fault Test: CICollection, Ranking and error complexity Ari CI testing pending: - CON of current TCTool implementation: it doesn't reflect what word was dragged when is a switch between contiguous words, it just says a word has been moved, and shows final order, so there is no way to deduce what was the word the user actualy moved. Does it matter? It wouldn't matter, unless the user also edited one of these two words. Currently, my frame assumes that some words need to be edited and moved as being part of the same error (Wi is both the word that was edited and the word that was moved)... since there is a causal relationship between those two cases, often it needs to be moved becuase it has a different form. - test log file 9 with no header and have my code pass the load method the parse trace and test to make sure it doesn't break. -> carefully test to see if alignments are correctly parsed and extracted!!! - when there is a clear alignment action followed by a delete word -> just take into consideration the delete word. Make a note somewhere were I'll remember to look at... - ignore alignments added from English subjects to Spanish verbs, no action needed -> find complex examples of alignments that are produced by the system, so that I can test Bill's code more thouroughly -> test CI on more complicated instances from user studies PENDING: -> need to test lexical case 2 with examples 4 and 7 4: John and Mary fell -> * Juan y María cayeron -> Juan y María se cayeron 7: I would like to go -> * me gustaría que ir -> me gustaría ir Jaime: in parallel to coding, start thinking of other examples that are structurally identical to the ones I have implemented, but so that I can say my methods are general. For each case, have about 10 examples that are supposed to exercise it. In the case of a rule refinement, have other examples that only have the problem the refinement is addressing and test it !!!!!!!!!!!!!!!! -> always back up all the code and data that cannot be regenerated to the Avenue afs directory