************************
Current version: V0.03
RuleRefinement.cpp

make RR or make RRclean
./RuleRefinement.exe
************************

updating files in temuco and afs backups:

cp -uR /avenue/usr0/aria/RuleRefinement/* .
*******************************************************


--- Previous notes from research-diary (collected Jan 12, 2005) -----------

Aug 10-12 2004

check black research diary for notes of what I did with Kathrin


April 3, 2003

- To run a specific grammar (Kathrin's learned grammar) on a test corpus, I can use
Kathrin's code(probably will need to modify it):

temuco: /usr0/kathrin/RuleLearning/RunTimeSystem/RunTimeSystem.cpp
(Make RunTimeSystem)

Aug 12, 2003

- created a CVS repository on avenue:/usr0/aria/RuleRefinement/CVS-Repository 

August 13, 2003

- Created a working directory under /usr0/aria/RuleRefinement called work.
Reinitiated the repository so that I could have everything under v0.0, but adding the RuleLearning files in the right way and then commiting to the changes.
Now everything is in avenue:/usr0/aria/RuleRefinement/CVS-Repository/v0.0
and my working directoy is avenue:/usr0/aria/RuleRefinement/work/v0.0

-> import a file and then commit
 pushd /path
 popd / path


- Looking at TrRule.hpp...
first need to modify the 2nd SetConstraint function, which is the one used in RR.cpp to have the same format in as out, right now, it takes constraints of the form:
 (x0 num p) and it outputs them as ((x0 num) = p)
Also, need to write a SetConstraint function w/o category, so that what comes in is exactly what gets added to the rule.
in Constraint.hpp:
 int Category; //1=parsing,2=transfer,3=generation,4=featurefilling/constrchecking
Kathrin modifies the constraint inputed in a way that when you enter (X3 NUM = X1 NUM) (i tried with any of the 4 categories), it outputs (X3 = NUM)!
- the key to a constraint is now X0__VAL__ETC, might want to change it to be the real string
- need to set the SetCSet tp be a string with the training set name + id, not just the id number
- need to write a mirror EraseConstraint fnc which given a string does the right thing
- write a Constructor ParseRule which given a rule (string?), it creates a rule of the class TrRule.cpp (i.e. stores everything in the right place).
first make sure everything is stored the way I want it to be stored.
-> created todo file in v0.0


August 11, 2004

- met with Kathrin to start the C++ skeleton for RR
  avenue:/usr0/aria/RuleRefinement
  main: RuleRefinement.cpp
  classes: CTL.hpp
	   ParseTree.hpp (K's previous code)
	   Lexicon.hpp (K's previous code)

August 12, 2004

- created avenue:/usr0/aria/MTEvaluation/bin
check README to see where I got some of the scripts from

Monday, August 30, 2004

- further specifying research plan

- copied some code from K's to be able to run the xfer engine from C++
  -> Makefile + ProduceLattice.cpp (avenue:/usr0/aria/RuleRefinement/K-code)

- created avenue:/usr0/aria/eng2spa/corpus/input-simulation
  with: 
	I see the red car
	I saw the woman
	Gaudi is a great artist


Wednesday, November 21, 2005

- met with Freddie to talk about the overall structure and data flow

Thursday, December 1, 2005

- met with Jaime and Alon:
	will work on CI class and will write down concrete steps that
	need to happen in my program in order to add a contraint to a 
	lexical entry and to a rule (acting as an oracle)


Friday December 2, 2005

- CTL.hpp -> CorrectionInstance.hpp
	this class will just map what's in the TCTool log file,
	just store relevant data in a useful day, no manipulation
	of the data -> use structs (+union)

	replaced all the instances of CTL in CI.hpp and in RR.cpp

- added constructor for CorrectionInstance class:

  // constructor, instanciates all the relevant variables from the log file
  void StoreTCToolLogFile(string LogFileName);  

- testing it in Test.cpp
    debugged code in CI.hpp and Rule.hpp, needed to copy over 
    StringUtils.hpp from K's dir

-> need to debug the Rule.hpp file more thoroughly (look at TrRule.hpp from , much more extense)

for now, not include in the Test file

make Test.o
make Test.exe
./Test.exe

need to debug now


Sunday, December 4, 2005

...

working on CorrectionInstance.hpp (Freddie helped me to debug my code to read
in a file)


Tuesday, December 13, 2005

- met with Erik about the interdace with the Xfer engine from my C++ code.
stored it into CallXferEngine.cpp:

#include "transfer.hpp"

vector<string> translations;
TransferEngine *xfer = new TransferEngine();

xfer->initFromFile(initfile); // same as usual without "quit" at then end!
xfer->processCommand("loadgra ...");
// ...

// Erik will add a trace to this method, so that I can access that as well
if (xfer->translate("Some Language Sentence", translations) > 0) {

}
// need to clearall before loading a new grammar/lexicon
xfer->processCommand("clearall");

Code is in transfer/stable-linux-2/

* doc.txt has all the internal methods that I can call if I need the 
     commands to return something (like for example, the num of rules loaded)

* Makefile: look at the first line to see what object files my program needs
  to call, and ask Erik if it doesn't work. He took a long time to get it
  right and he anticipates me needing help with this.

* transfermain.cpp

* transfersupport.cpp (Erik wrote it for Kathrin)     

- Looking at avenue:/usr0/aria/eng2spa to see what initfile I want to load
-> will need to load grammar and lexicon separately.

- created an init file in /usr0/aria/eng2spa/auto-init which is meant to be 
loaded by the RR module, and thus it doesn't load any specific grammar or 
lexicon and it does not end with "quit".

- working on Makefile to compile CallXferEngine, got stuck, sent email to Erik

- met with Erik and debugged Makefile and CallXferEngine, it's running now 
and it does the right thing :-)
Using most updated version of the files right now (/Transfer), since 
stable-linux2 is out of date, will change once he updates it.

[aria@avenue RuleRefinement]$make CallXfer

[aria@avenue RuleRefinement]$ ./CallXferEngine
Turning on Latin-1 mode
Setting normalizecase to UPPER-CASE
Setting find all translations to ON
Setting output source text to ON
Setting showtrace to full trace with src indices
Loading rule file /usr0/aria/eng2spa/grammars/grammar3.trf with 41 rules added
Loading lexicon file /usr0/aria/eng2spa/lexicons/lexicon3.trf with 451 lexical entries added
these are the alterative translations for I saw the red car:
VI EL AUTO ROJO
YO VI EL AUTO ROJO
Deleting all loaded rules.
clearing all the files loaded to the transfer engine

- updating main so that it reflects the changes to the CorrectionInstance
interface (actions and errors stored differently now).

- emailed Freddie: is there a most up-to-date CI class? Nope, but he wants 
to work on it soon.


Wednesday, December 14, 2005

- debugging and looking at Rule.hpp... maybe I should try to use K's Rule class
instead (TrRule.hpp), it looks like it has many of the methods i need.

cp RuleRefinement.cpp RuleRefinement-old.cpp
changed Rule by TrRule

- need to test the methods in main, using old CI for now:
cp CorrectionInstance-old.hpp CorrectionInstance.hpp
... never mind, the old version is gone... working on the most updated 
version, which I sent to Freddie -> adapting code in main.

-> realized I should probably simplify all the structs and union of structs
... by just having classes so that it's easier for me to access everything.

And I should probably work with strings as words instead of a struct...


need to:

- working on adding the code to main for a end-to-end simulation


Thursday, December 15, 2005

- briefly met with Freddie to talk about embedded structs and unions
	i think it's way too complex... he finally agreed :-)


Friday, December 16, 2005

- copied struct/union version of CorrectionInstance and RuleRefinement.cpp to
old-attempts, so that I can simplify my code.
	I didn't manage to fully debug it, and for some reason it never got 
	inside the if loop in line 111 ( if (MyAction.type == "edit") ), so I
	decided to leave it at that.

- changing Action from struct to class, leaving Word as struct for now, will
probably need to move it to a Utils class that gets included to most files.


Saturday, December 17

- can't use Visual C++ without major changes, working code under unix
gives me tones of compiling errors when I try to compile it from Visual C++ :(((

Sunday, December 18

- copied everything on cygwin and made all the necessary changes to compile it
and have it working locally


Monday, December 19, 2005

- managed to modify a rule from RuleRefinement and load it back to the grammar :-)
now I only need to plug in the Xfer engine and I'll have the end-to-end
super hacky system :-)

- realized I can't run CallXferEngine locally... needs to look at all the files
 for GENKIT, TRANSFER, MORPHOLOGY,... including object and some header files
 look in Makefile


Tuesday, December 20, 2005

- Need to remember it's  
	[aria@avenue RuleRefinement]$ make CallXfer  
	// and not make CallXferEngine!!!
and then:
	./CallXferEngine

- modified the Makefile so that RuleRefinement now also contains all the paths
to call and run the Xfer engine, but now since the Lexicon class is also 
defined by GenKit (UKernel), which Ben wrote, I get tones of compiling errors:

[aria@avenue RuleRefinement]$ make RR
/usr/local/bin/g++ -g -ftemplate-depth-24 -Wno-non-virtual-dtor -DNDEBUG -o RuleRefinement.o -c RuleRefinement.cpp -I/temuco/usr5/shared/code/antlr/antlr-2.7.1/lib/cpp -I/afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/stable-linux2 -I/shared/Genkit/UKernel -I/shared/Genkit/Toolbox
In file included from ParseTree.hpp:9,
                 from CorrectionInstance.hpp:8,
                 from RuleRefinement.cpp:9:
Lexicon.hpp:14: syntax error before `{' token
Lexicon.hpp:29: `string' was not declared in this scope
Lexicon.hpp:29: syntax error before `(' token
Lexicon.hpp:30: `string' was not declared in this scope
Lexicon.hpp:30: syntax error before `(' token
Lexicon.hpp:31: syntax error before `(' token
Lexicon.hpp:33: `string' was not declared in this scope
Lexicon.hpp:33: syntax error before `)' token
...
Tokenizer.hpp:38:   instantiated from here
/usr/local/lib/gcc-lib/../../include/c++/3.2.3/bits/basic_string.h:341: `__s'
   undeclared (first use this function)
/usr/local/lib/gcc-lib/../../include/c++/3.2.3/bits/stl_vector.h: In copy
   constructor `std::vector<_Tp, _Alloc>::vector(const std::vector<_Tp,
   _Alloc>&) [with _Tp = std::string, _Alloc = std::allocator<std::string>]':
Tokenizer.hpp:42:   instantiated from here
/usr/local/lib/gcc-lib/../../include/c++/3.2.3/bits/stl_vector.h:346: `
   uninitialized_copy' undeclared (first use this function)
/usr/local/lib/gcc-lib/../../include/c++/3.2.3/fstream:358: confused by earlier errors, bailing out
make: *** [RuleRefinement.o] Error 1

need to figure out how to set the scope of my code so that it does the right 
thing. Erik suggested using 
using namespace MyLex {

// My code defining (and maybe using) Lexicon goes here

}

but that doesn't seem to work either.

- went to lexicon.hpp and .cpp in Ben's code /usr2/shared/Genkit/UKernel
to see how the namespace is used and did the same in my code. 
In Lexicon.hpp:

	#ifnded ... 
	#define ...

	namespace MyLex {

	// code here

	}; // end of MyLex scope

	#endif
In RuleRefinement::

	MyLex::Lexicon::WhateverMethodINeed


and it compiled!!! :-)


Now there seems to be a problem with the grammar file... the outer parens are
missing...


[aria@avenue RuleRefinement]$ make RR
/usr/local/bin/g++ -g  -ftemplate-depth-24 -Wno-non-virtual-dtor -DNDEBUG -o RuleRefinement.o -c RuleRefinement.cpp -I/temuco/usr5/shared/code/antlr/antlr-2.7.1/lib/cpp -I/afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/stable-linux2 -I/shared/Genkit/UKernel -I/shared/Genkit/Toolbox
/usr/local/bin/g++ -g  -ftemplate-depth-24 -Wno-non-virtual-dtor -DNDEBUG -o RuleRefinement RuleRefinement.o /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/transfer.o /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/transfer-support.o /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/TransferGrammarLexer.o /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/TransferGrammarParser.o /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/UnicodeTools.o /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/chinese.o /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/english.o /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/FStructLexer.o /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/FStructParser.o /shared/Genkit/Toolbox/*.o /shared/Genkit/UKernel/*.o  -L/temuco/shared/code/antlr-2.7.5/lib/cpp/src -lantlr
[aria@avenue RuleRefinement]$ ./RuleRefinement
...
SLSentence is  I see the red car
Cannot open init file /cygdrive/c/mt/eng2spa/auto-init-simulation.txt
command is loadgra /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED.trf
Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED.trf with 11 rules added
no translations found by the Xfer engine
Deleting all loaded rules.
clearing all the files loaded to the transfer engine

- changed path...

it doesn't translate the SL sentence...

looked into it:
Before running RR module

[aria@avenue eng2spa]$ transfer -if init-simulation.txt
Turning on Latin-1 mode
Setting normalizecase to UPPER-CASE
Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar.trf with 5 rules added
Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon.trf with 7 rules added
Setting find all translations to ON
Setting output source text to ON
Setting showtrace to full trace with src indices
0: VEO EL AUTO ROJA
1: VEO LA AUTO ROJA
2: VEO EL AUTO ROJO
3: VEO LA AUTO ROJO


After running RR module

[aria@avenue eng2spa]$ transfer -if init-simulation.txt
Turning on Latin-1 mode
Setting normalizecase to UPPER-CASE
Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED.trf with 11 rules added
Setting find all translations to ON
Setting output source text to ON
Setting showtrace to full trace with src indices
 * No complete translation found
LEX 0 GRA 5 UNK 0 MORPH 0 COMP 0
VEO EL AUTO ROJO 
tree: <((S,0 (VP,1 (V,1:2 "VEO") ) ) )> <((NP,8 (DET,1:3 "EL") (N,1:5 "AUTO") (ADJ,2:4 "ROJO") ) )> 
[aria@avenue eng2spa]$


But when I call auto-init.txt, which has the same parameters, it doesn't
translate it:

SLSentence is  |I see the red car|
Turning on Latin-1 mode
Setting normalizecase to UPPER-CASE
Setting find all translations to ON
Setting output source text to ON
Setting showtrace to full trace with src indices
command is loadgra /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED.trf
Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED.trf with 11 rules added
no translations found by the Xfer engine
Deleting all loaded rules.
clearing all the files loaded to the transfer engine


Erik:
- it's not a full parse!!! ->translate only outputs anything if it finds a 
full parse, to get partial parse info, need to use bestpartial() (returns a 
string).

added:

string  partialparse = xfer->bestpartial();

or if using a language model:

	bestpartiallm();

- Erik told me there is a Spanish LM and all I need to do is add this line
to the init file:

uselm /temuco/shared/data/Spanish/SpanishLM.index /temuco/shared/data/Spanish/SpanishLM.srilm

Erik doesn't remember which data he used to build this, but he says it's not
too big and he thinks he used some of the same data EBMT uses...

[aria@avenue RuleRefinement]$ ./CallXferEngine
Turning on Latin-1 mode
Setting normalizecase to UPPER-CASE
Setting find all translations to ON
Setting output source text to ON
Setting showtrace to full trace with src indices
Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED.trf with 11 rules added
LEX 0 GRA 5 UNK 0 MORPH 0 COMP 0
VEO EL AUTO ROJO
tree: <((S,0 (VP,1 (V,1:2 'VEO') ) ) )> <((NP,8 (DET,1:3 'EL') (N,1:5 'AUTO') (ADJ,2:4 'ROJO') ) )>
Deleting all loaded rules.
clearing all the files loaded to the transfer engine

RR:
(...)
SLSentence is  |I see the red car|
initfile is /usr0/aria/eng2spa/auto-init.txt
Turning on Latin-1 mode
Setting normalizecase to UPPER-CASE
Setting find all translations to ON
Setting output source text to ON
Setting showtrace to full trace with src indices
Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED.trf with 11 rules added
LEX 0 GRA 5 UNK 0 MORPH 0 COMP 0
No full parse was found!
Partial parse is:
VEO EL AUTO ROJO
tree: <((S,0 (VP,1 (V,1:2 'VEO') ) ) )> <((NP,8 (DET,1:3 'EL') (N,1:5 'AUTO') (ADJ,2:4 'ROJO') ) )>
Deleting all loaded rules.
clearing all the files loaded to the transfer engine


- For some reason the 2nd call to LoadLexicon to load the Grammar is not 
loading all the rules in the grammar file (simulation-grammar-REFINED.trf), 
VP,3 is missing, which is causing the MT system to not find a full parse :-(((

need to look into this...

     check K's lexicon.hpp in V0.3 first, maybe she has an improved version...


Wednesday, December 21, 2005

- looking at grammar and lexicon files: it looks like it didn't load the last
rule from the simulation-grammar, but it did load all the rules in the lexicon...

- looking at LoadLexicon method: according to the implementation, the only way
it doesn't add a rule is if it has the same id as another rule already in the
lexicon
	why doesn't it output "sCombWords:" for the grammar rules?
	K's initial implementation, stores the lexicon in a different way,
	for each access method... 
	-> see if it's possible to store it in one general way, and then
	move the processing to the methods (what's more efficient?)

- looked at K's Lexicon.hpp in V0.3 and it's much simpler, just
one access method: 
    static set<string> Lookup(string SLWord, string TLWord);

- tried to add a print method to it and adapt it to map<string,set<string> > 
but didn't manage to make it work...

- looking at LoadLexicon method again, still not sure what 

      if (mssMasterLexicon.find(sID) == mssMasterLexicon.end()){
		mssMasterLexicon.insert(map<string,string>::value_type(sID,AccumRule));

is doing, the rule hasn't been processed at this point, and this is the only
place where it's inserted to the MasterLexicon, which is what gets printed...
but it doesn't seem to be getting there... doesn't print the debugging 
statements :(

- copied Lexicon.hpp into Grammar.hpp, I moved what I had started working on to
Grammar-ari.hpp

- after some debugging, it's now working again, but the same problem still occurs.
  -> moved VP,3 before VP,1 and tried again: it's working now!!!
  getting:

  SLSentence is  |I see the red car|
  initfile is /usr0/aria/eng2spa/auto-init.txt
  Turning on Latin-1 mode
  Setting normalizecase to UPPER-CASE
  Setting find all translations to ON
  Setting output source text to ON
  Setting showtrace to full trace with src indices
  Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED.trf with 5 rules added
  Loading lexicon file /usr0/aria/eng2spa/lexicons/simulation-lexicon-REFINED.trf with 7 lexical entries added
  these are the alterative translations for I see the red car:
  VEO EL AUTO ROJO
  VEO LA AUTO ROJO
  Deleting all loaded rules.
  clearing all the files loaded to the transfer engine

- NEED TO BE CAREFUL WITH DATA INCONSISTENCY!!! because I have the data stored
in 3 different ways, so that I can access it in three different ways (need
to make sure this is what I really need), now when I replace a rule, it's 
only replaced in the MasterLexicon, I think, but not in the other Data 
structures

- cleaned all the "cout <<" that are not necessary to illustrate the process
and saved the code:

    cp RuleRefinement.cpp RuleRefinement.cpp.simul-end-to-end
    ./RuleRefinement > RR-simul-end-to-end.txt 

- started working on adding a new lexical entry (end of RR.cpp)...
but LoadLexicon doesn't seem to work the second time I call it, eventhough
it used to work when I was using the same class to load the grammar, need
to debug!


Thursday, January 12, 2006

- working on NextStepsAlgorithm-Jan12.doc (Jaime wants it for next Tuesday
morning)

- created 00-End2End.txt so that I have a record of what scripts and files 
need to be run/created in order to run the RuleRefinement module from 
end-to-end


Tuesday, January 17, 2006

- went over NextStepsAlgorithm-Jan17.doc with Jaime, and he agreed I should
start right away with the backend tasks

- also see 00-Add2Thesis for an expansion of Figure 6, keep in mind while 
implementing!!!


Wednesday, January 18, 2006

- created 2 directories V0.01 and V0.02
and copied all the code files to both of the directories

- leaving V0.01 unchanged, working on V0.02 for the next iteration...

- working on task 1:
copied code from the end and debugging...

doesn't seem to be printing out the first cout <<...
already delted .o file, but is still not showing the print out... weird!
what could be preventing it from printing??


Thursday, January 19, 2006

- the problem was that my Makefile was creating an executable that does not
have .exe at the end, but was just called RuleRefinement, and I kept trying 
to run RuleRefinement.exe which was some old version from December, duh!!!

- modified the Makefile:

make RR will now create a file called: RuleRefinement.exe
and RRclean should first remove the object file and then create the exe file

also added RRnoXfer to the Makefile, now just need to add a XferON flag
to main... need to debug, getting lots of errors...

also modified Makefile in V0.01

- back to the lattices...

- cleaned code in main() a bit 

- implemented task 1, however, it's not true that if CTLS = TLS_n, 
then we want to do nothing... 
one of the tasks of the RR module is to make the grammar tighter, for example
for "I see the red car", and so assume all the sentences are of that kind
for now.

- reporting if the incorrect translation has not been generated by the refined
system at the end

- backed up my code to AVENUE afs:
copied V0.02 to /afs/cs.cmu.edu/project/avenue-1/Avenue/RuleRefinement
and moved all the old file to V0.01


Friday, January 20, 2006

- met with Christine about potential help, but we decided it wasn't a good 
match due to her not knowing C++ already

- trying to figure out how to implement a destructor for the Lexicon class
looks like clear() from STD should do the trick

- implemented destructor for the lexicon: DeleteLexicon() { member.clear(); }
Task 5 done :-)

- looking into task 7
-> I should probably reimplement the lexicon class, using the rule class.
Maybe leave for later.
-> debug Lexicon::Print

- extracting POS from parse

- implemented a toupper for a string instead of a char (allcaps), which is 
working in main

- moved allcaps to Utils.hpp
  -> need to debug Utils, for some reason, I'm getting some errors in Rule as well...


Monday, January 23, 2006

- debugged AllCaps, was including the wrong Rule file and it needed to have 
"static" before its declaration!

- moving on... parsing parse.tree for POS (can't extract just POS... ask Stephan)

- to concatenate strings (=append), use strcat(S1,S2), which produces S1S2

- creating lexical entry: maybe I can add a method later that does this
 4 INparams: POS, nextID(extract from lexicon), tlword, slword
 -> need to keep a counter for each POS, need to add next available ID
  calculate this that when I load in the lexicon...
 // making it up for now

- testing 2: adding a sense, for some reason it doesn't find "plays" in the
lexicon, need to debug


Tuesday, January 24, 2006

- the problem with translate(sl,trace) was that I had a transfer.hpp file in
my local directory which was the one being called, instead of the one 
specified by the TRASNFERPATH, I have no idea why... but hey, it's working now

- met with Stephan to discuss implementation issues


- met with Stephan to discuss implementation issues:

      * REUSING EXISTING CODE: 
      ask Erik if I can hace direct access to the lexicon and grammar classes
      so that I can modify them once they are loaded into the xfer engine, 
      and so that I don't have to reload it every time, this would be very time
      consuming for very large grammars or lexicons (~10K words).

      If I have a base class from Erik, I can then derive an Extended Grammar
      class from that, that adds on to it with the functionality I need for the
      RR. Two different "save" methods, one that saves the basic info, 
      necessary to run the Xfer engine, and one that saves all the info required
      for RR tracking, history of rules, etc.
      
      Is Erik using a library (antlr) ? his own library? is it stable enought that I 
      can use it?

      Grammar:			ExtendedGrammar
      vector<Rule> MyGra	flag {active,inactive}
				vector<Rule> History
      save();			save();

      ExtendedGrammar *pMyExtGra = new ExtendedGrammar;
      
      MyExtGra->save(); // saves all the information, including the base information
			// and the RR-related info (history, active v. inactive, etc.)

      (Grammar MyExtGra)->save(); // can cast it so that I use the base class save method,
				  // to save just the basic info that the Xfer engine needs
				  // not sure if this is the exact syntax


      * TREE: the best thing to do to extract the POS from the trace is to have
      a tree class or DS which I can load the tree into and then I can look it
      up in any way I need to (I'll also need to do this for blame assignment)
      Convert string into a tree structure.

      * STATIC variables: Current Lexicon and Grammar class (from K) are 
      declaring them as a static variable, which essentially makes them 
      globally accessible (by using Lexicon:: or Grammar::), and so there 
      isn't an instance called Lexicon/Grammar.

      I think that this is probably fine for utility methods such as AllCaps,
      don't really want to have to create a Utils object, but for something
      like the lexicon or the grammar, I think it would make more sense to 
      have a constructor that instantiates the right thing in the object,
      and so I'd have Lexicon.load("LexFileName") or something like that.

      A good use of static variables could be the POS id counter that the
      grammar and lexicon need to keep, which should be uniquely accessible
      from all the rules/entries at any point. And thus for the grammar class,
      I could have an enum, for example, with NPcount, VPcount, etc.
      which would tell me what is the next available Rule ID for each POS.

      * using memory in HEAP instead of STACK (and the use of pointers):
      class CLexicon {
	    public:
			CLexicon(); //defalut constructor
			CLexicon(string FileName); // constructor that actually loads in a lexicon  = LoadLexicon
      }

      --------
      // this way to create an object from a class, stores it into the heap
      // instead of the stack, which is limited, and so this is a good way 
      // to avoid memory leaks
      // Erik does it like this for the Xfer engine, since the Xfer engine 
      // needs a lot of memory

	CLexicon *pLexicon = new CLexicon("lexicon1.trf");

	pLexicon->LookupByID("ID"); // = (*pLexicon).LookupByID
				       this is derrefering the pointer first


	// This would create an object and store it into the stack
	CLexicon MyLexicon;

	MyLexicon.LookupByID("ID");


	* NOTATION:
	s_name =  static variable
	m_name = member variable (to distinguish between local variables and 
				 member variables when within a method)

	Cname = class object
	pname = pointer


	* RULEREFINER CLASS: think about what data members and methods it needs
	(grammar, lexicon, correction instance, parse tree, etc.)
	this would have all the operation types defined in methods, nice way to 
	organize code + keep track of what's missing


	* PARAMETERS:
	method(in,in, in/out) vs.  out method(in, in)	

	in/out params are useful for cases when we want to modify something,
	for example populate a list, that already contained some infor, or 
	expand a grammar/rule.

	This is the way to modify complex objects, so that I don't have to copy
	big objects over and over.

	* STRING operations are expensive, so try passing by reference, and
	by const reference if I know I don't mean to modify them (ever).
	
	Example of bad use: string&  method(...) {
			    string NS;
			    return NS; }
	    NS is a local variable, and so when the function ends, its value is lost
	    (out of scope), and so it doesn't have a reference. 

	    an out param should not be by reference, unless it has been declared outside
	    the function.

		string method(...) or string NS
				      string& method(...) {
				      modify NS;
				      return NS;}
					    

	* CALLING A PERL script from C++: using pstream 
	see TextFilter dir in bin/ for an example of how to call
	an external perl scripts from C++ .
	
	see: avenue:/usr0/aria/RuleRefinement/bin/TextFilter/README


Wednesday, January 25, 2006

- met with Jaime and Alon
  Jaime wants me to test case one with a different type of agreement
  constraint (between the subj and the verb, say)

  Time efficiency issues:
  - when having a huge lexicon, instead of loading the whole file once
  it's modified, I can do two things:
       - load only the entries used in the sentence(s), creating a small
       working lexicon (Jaime)

       - use the reloadgra reloadlex command instead of the loadgra command,
       which also takes a file name, and the argument should be a grammar or
       a lexicon, just with the changes. So for example, if a rule has been
       slightly modified and needs to be replaced, it's added to the grammar
       file with the original ID, and if a new rule needs to be added, a new
       ID is created and the rule is added to the modified grammar. 
       What the reload method does is to check if the ID is already there, if 
       it is, it replaces it with the newest version, if it's not, it adds it.
       -> need to test

- working on testing 1st case (adding an agreement constraint) but between
  the subject and the verb...
  added example int variable (1,2...) and expanding grammar and lexicon
  as well as running the xfer engine to obtain exact output.

  saved system's output in debugging-subj-obj-example and made a copy of
  the executable so that if something breaks later, I can show Jaime and 
  Alon:
 	cp RuleRefinement.exe RuleRefinement-ex2.exe

- testing 2: adding a sense, for some reason it doesn't find "plays" in the
lexicon, need to debug
	 -> LoadLexicon is not actually populating SLLexicon!
	 need to modify the method


Thursday, January 26, 2006

- working on modifying the LoadLexicon method to actually populate the SLLexicon as well...

  - started working on creating static POSCounters to be able to create new
  lexical entries and grammatical rules. Stuck when trying to cast a string
  into an integer, emailed Stephan asking for help.

  check: atoi, atol strtol

-  Lexicon.hpp: Lookup methods don't seem to be working neither for SLword 
nor for TLword... fully debugging them, making sure I understand how they 
are loaded and then testing by printing out the second value of the 
map<string,set<string> >

in LookupSL: it's not going thru the if statement, but this is the same if
statement that works fine for LoadLexicon... added cout statements and it looks
fine...


Friday, January 27, 2006

- still debuggin LookupSL method...

- Stephan found the bug: I was looking up slword instead of SLWORD, duh!!!
  he made a bunch of suggestions, need to look into it and modify my code.


Saturday, January 28, 2006

- fixed bug in main so that the right word is looked up :-)

- finished adding sense case :-)))))


Sunday, January 29, 2006

- debugged case 1 (adding a bran new lex entry, problem with the parens)


Monday, January 30, 2006

- now both lexical entry examples are working and I am printing a file
with just that entry to the lexicon directory

- use reloadgra to add to lexicon already loaded in the xfer engine
Note: reloadlex is not implemented yet, and as long as I use loadgra lexfile the first time,
I should have no problems using reloadgra newentry. 
For huge lexicons, it might get too slow -> let Erik know.

- Wanted to implement a lexical filter that only loads the lexical entries for the words 
that appear in the SLsentences, need to extract SLSs first and then implement a 
special loading function that does a lookup first,
but Erik says that his code already effectively does that, and it only loads the entries that
are needed... so I don't have to worry about that for the Xfer engine, maybe just for my code.

- Lexical examples 2 and 3 are now working (cases 1 and 2)
2: I see the red unicorn -> * veo el unicorn rojo -> veo el unicornio rojo
3: Mary plays the guitar -> * María juega la guitarra -> María toca la guitarra

- wrapping up: compiled and run it to make sure the trace looks good to show Jaime and Alon
on wednesday

***********************************
***** MOVING TO V0.03... **********

- reorganize the logic of the program, checking against lattice before loading lexicon, etc.

- splitting CorrectionInstance into hpp and cpp (otherwise the compiler needs to go over
the whole implementation every time, even if it hasn't changed!)

- split all other classes, so that I could get all the dependencies in the Makefile right

- worked on the Makefile with Stephan: added dependencies for each class, 
so that the make file knows which files it needs to update for each object.
It's compiling and running as before!!! :-)

- to compile classes one by one: g++ -c ClassName.cpp


Tuesday, January 31, 2006

- updated V0.02 into Avenue afs directory and backed up V0.03

- copied Stephan's SimpleTests.cpp into String2Int.cpp, this illustrates
how to cast strings to integers (-> POScounters) 

- using Test.cpp for testing my code

************************************
- created a new object: Refiner
************************************

- Moving chunks of code into Refiner:

-> need to debug Refiner + Makefile
 it can't find transfer.hpp


Wednesday, February 1, 2006

- saved TestParam.tar file from Stephan in avenue:/usr0/aria/RuleRefinement/bin

- managed to compile Refiner.o without transfer.hpp included
still need to debug it to be able to include transfer.hpp (Makefile for 
Refiner.o looks like RuleRefinement.o, should work...)
[aria@avenue V0.03]$ g++ -c Refiner.cpp
In file included from Refiner.cpp:11:
Refiner.hpp:14:24: transfer.hpp: No such file or directory
Refiner.cpp:15:24: transfer.hpp: No such file or directory

- working on making RuleRefinement more modular

- met with Jaime and Alon, showed them the 4 examples that are working...
Jaime wants me to show this to Lori some time.

-> try loading the new grammar rule for example 1 (auto rojo) with reloadgra
method


Thursday, February 2, 2006

- changed GetCTLS into GetCTLSentence for consistency (CI.* and Refiner.cpp)

- passing variables in and out methods... debugging

- moved code from the beginnign of RR.cpp to:

   * map<string,string> AccessLogInfo(CorrectionInstance* pCI); // instantiated data members
   // which I'm currently not using due to a variable access problem

   * vector<int> DetectAction(CorrectionInstance* pCI, int example);

   * bool CTLSinXferLattice(string GramFileName, string LexFileName, string SLSentence, string CTLS);

   * bool CTLSinRefinedLattice(string RefinedGramFileName, string RefinedLexFileName, string SLSentence, string CTLS, string TLS);
  // try to make this more general so that it can be applied to both
  // refined and unrefined lex and grams -> have a flag for that
  // divide in smaller methods

- but I can't seem to successfully include transfer.hpp into Refiner.cpp

[aria@avenue V0.03]$ g++ -c Refiner.cpp
Refiner.cpp:15:24: transfer.hpp: No such file or directory

- commented it out for now, moving onto adding a parameter to RR.cpp, 
 so that I don't need to recompile every time I need to test a different 
example. Looking at bin/TestPassParam/ (didn't have time)

- In order to be able to use the same variables in the Refiner class, 
I can pass the object that contains those variables (CorrectionInstance) as 
a pointer to an initialization method, and that will actually store the pointer
as a data member (m_pCI) into the Refiner class.


Friday, February 3, 2006

- the include problem could be the following (Stephan):
I need to provide an include path to the compiler, that it knows where to 
look for additional include files, esp. when they are not in the current 
directory.  I have this include path set correctly in you Makefile, 
so calling make Refiner.o should work. However, if I just use 
g++ -c Refiner.cc 
then this include path is missing.  I could add it with the -I option, 
but it is easier to use the makefile.

did: make Refiner.o 
and it compiled... but then I was still getting:

[aria@avenue V0.03]$ make RR
...
s/cs.cmu.edu/project/avenue-1/Avenue/Transfer/stable-linux2/UnicodeTools.o /shared/Genkit/Toolbox/*.o /shared/Genkit/UKernel/*.o  -L/temuco/shared/code/antlr-2.7.5/lib/cpp/src -lantlr
/usr/bin/ld: Dwarf Error: Invalid or unhandled FORM value: 14.
/usr/bin/ld: Dwarf Error: Invalid or unhandled FORM value: 14.
/usr/bin/ld: Dwarf Error: Invalid or unhandled FORM value: 14.
...
/usr/bin/ld: Dwarf Error: Could not find abbrev number 343.
RuleRefinement.o: In function `main':
RuleRefinement.o(.text+0x804): undefined reference to `Refiner::CTLSinXferLattice(std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> >)'
collect2: ld returned 1 exit status
make: *** [RR] Error 1

What was really going on is that I forgot to add Refiner:: before the 
CTLSinXferLattice method!!! duh!

- thinking about different methods required to manipulate rules (Lex and Gra)
wrote a file called /usr0/aria/RuleRefinement/ManipulatingRules.txt 
added to NextSteps file as well

- index constraints by the two y-positions by storing them in a matrix: 
 vector<vector<vector<Constraint>>>


Monday, February 5, 2006

- backed up files from V0.03 to Avenue afs directory

- adding parameters to main (using TestParam class from Aachen)
compiler error: it has to do with the including of ParamDef.hh...

In file included from RuleRefinement.cpp:11:
Param.hh:126:23: ParamDef.hh: No such file or directory
[3:43:26 PM] Ariadna says: I've tried adding <> around the file name, and including ParamDef.hh from the RR.cpp, but I'm still getting error messages...
[3:44:39 PM] Ariadna says: actually,if I just try make Param.o, I get tones of errors, even though I just copied the file from your Test directory...
[3:45:44 PM] Ariadna says: [aria@avenue V0.03]$ make Param.o
/usr/local/bin/g++ -g  -ftemplate-depth-24 -Wno-non-virtual-dtor -DNDEBUG -o Param.o -c Param.cc
Param.cc:14:20: Param.hh: No such file or directory
Param.cc:23: syntax error before `::' token
Param.cc:24: syntax error before `::' token
[3:45:55 PM] Ariadna says: ...

- replaced the <..> in the Param files by "..", i.e. #include "ParamDef.hh"
The two different ways make a difference as to where the compiler searches.
in STTK we use <..> everywhere, but the current directory is then included in
the Include path.  "..." means that the current directory is search, 
even if it is  not in the Include path (well, this is how I think it is, 
but no guarantee given;-)

So, when you have some time you might test to use the <...> but add to your 
include path -I..  Notice the fullstop after the I, which stands for current 
directory.  <...> is usually only used for the system includes.

- fixing the GramFileName params which are now pointers (Lexicon, Refiner, etc.)

- stuck again with Makefile... for some weird reason it wasn't linking right 
and when Param.o and ParamDef.o were deleted from the OBJS_RR list and then 
added again, it did... weird!!!

But not it's working:

[aria@avenue V0.03]$ ./RuleRefinement.exe
Compiled on Feb  6 2006 23:10:32 with g++ version: 3.2.3 in debug mode


CParam.AddParamDef :
 Internal warning: the flag 'l' is reserved for special purposes or has
 been used for another parameter.
 It should not be redefined.
          BEWARE OF UNPREDICTIBLE BEHAVIOUR!

Parameters are:
Debug Level     = 1
Lexicon File    = /usr0/aria/eng2spa/lexicons/simulation-lexicon.trf
Grammar File    = /usr0/aria/eng2spa/grammars/simulation-grammar.trf
TCTool Log File = /usr0/aria/RuleRefinement/IOFiles/SampleTCToolLogFile-processed
Example number = 2
0
...

need to debug the rest of the program, make sure I'm passing what I need to.
At some point I'm giving the xfer engine a cout comment, instead of the
SLSentence, need to debug! Form output:

...
**************************************************
1. Checking against the existing lattice...
**************************************************
initfile is /usr0/aria/eng2spa/auto-init.txt
Turning on Latin-1 mode
Setting normalizecase to UPPER-CASE
Setting find all translations to ON
Setting output source text to ON
Setting showtrace to full trace with src indices

** NO FULL PARSE FOUND.  DOING PARTIAL TRANSFER: 
LEX 0 GRA 0 UNK 0 MORPH 0 COMP 0
SLATIONS AND THEIR PARSES FOR 
tree: <(V,0:1 'SLATIONS')> <(V,1:2 'AND')> <(V,2:3 'THEIR')> <(V,3:4 'PARSES')> <(V,4:5 'FOR')> 

** NO FULL PARSE FOUND.  DOING PARTIAL TRANSFER: 
LEX 0 GRA 0 UNK 0 MORPH 0 COMP 0
AND THEIR PARSES FOR 
tree: <(V,1:1 'AND')> <(V,2:2 'THEIR')> <(V,3:3 'PARSES')> <(V,4:4 'FOR')> 
No parse found.
LEX 0 GRA 0 UNK 0 MORPH 0 COMP 0
No full parse was found! 
Partial parse is:
SHE READ 
tree: <(V,5:1 'SHE')> <(V,6:2 'READ')> 
Deleting all loaded rules.
From MAIN: nope, it's NOT there :-)
In LoadLexicon, LexiconFile is:|/usr0/aria/eng2spa/lexicons/simulation-lexicon.trf|
FileName is /usr0/aria/eng2spa/lexicons/simulation-lexicon.trf
Now looking up Wi LEÍ...
...

need to debug this!


Tuesday, February 7, 2006

- making a pointer to a CorrectionInstance a private member of the Refiner 
class so that I can access to all the information I need to.
Also making it static and public for now... otherwise I need to write all
the set/get methods for it again...

- modfying V0.03 from local machine (avenue) and leaving Avenue afs version 
untouched for now as a back up.


Wednesday, February, 2006

- debugging passing pointers as params to Refiner:: methods

It turns out I need to declare the static private data member in Refiner.cpp 
again, with the class scope in front of it, otherwise, linker complains
(undefined reference to `Refiner::m_pCI'), thus added this:

CorrectionInstance* Refiner::m_pCI;

it now compiles AND links and runs!

-> now PassingCI as a pointer and made it static a private data member of the 
Refiner class so that I can access all the relevant info from Refiner and 
I don't need to be passing it back and forth

-> need to fully debug!!! Make sure I only store what I need once!

- static member variable means that all opjects share this member.  In you case all RR objects share the same pointer to the correction instance.  This does not really matter, as you will only have on RR object.  But if you had multiple RR objects at the same time, they probably would have their individual CorrectionInstance pointer.  So, I would suggest to remove the static.  There is not difference in efficiency or ease of use.
- with respect to accessing the member variables of the CI:  what I typically do is to assign those variables to local variables the first time they enter the current object, in you case, when you give the CI to the RR object, you could have something like m_pSLSentence =pCI-> pSLSentence.  You don't want to copy large data structures, use have a local pointer to it.   The dereferencing operation will cost you no time, so it is only an issue of writing pCI-> over and over again, and a matter of syle, i.e. if you want to see in your code that this jokers belong to the CI object.


- tested param class with different params
  * default params:
[aria@avenue V0.03]$ ./RuleRefinement.exe
Compiled on Feb  8 2006 16:16:07 with g++ version: 3.2.3 in debug mode

Parameters are:
Debug Level     = 1
Lexicon File    = /usr0/aria/eng2spa/lexicons/simulation-lexicon.trf
Grammar File    = /usr0/aria/eng2spa/grammars/simulation-grammar.trf
TCTool Log File = /usr0/aria/RuleRefinement/IOFiles/SampleTCToolLogFile-processed
Example number = 2

  * different params for DebugLevel and Example:
[aria@avenue V0.03]$ ./RuleRefinement.exe -d 0 -e 1
Compiled on Feb  8 2006 16:16:07 with g++ version: 3.2.3 in debug mode

Parameters are:
Debug Level     = 0
Lexicon File    = /usr0/aria/eng2spa/lexicons/simulation-lexicon.trf
Grammar File    = /usr0/aria/eng2spa/grammars/simulation-grammar.trf
TCTool Log File = /usr0/aria/RuleRefinement/IOFiles/SampleTCToolLogFile-processed
Example number = 1

  * different log file (even though main is not doing anything with it yet)
[aria@avenue V0.03]$ ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/SampleTCToolLogFile-GaudiExample
Compiled on Feb  8 2006 16:16:07 with g++ version: 3.2.3 in debug mode

Parameters are:
Debug Level     = 1
Lexicon File    = /usr0/aria/eng2spa/lexicons/simulation-lexicon.trf
Grammar File    = /usr0/aria/eng2spa/grammars/simulation-grammar.trf
TCTool Log File = /usr0/aria/RuleRefinement/IOFiles/SampleTCToolLogFile-GaudiExample
Example number = 2

   * lexicon
[aria@avenue V0.03]$ ./RuleRefinement.exe -l /usr0/aria/eng2spa/lexicons/simulation-lexicon2.trf
Compiled on Feb  8 2006 16:16:07 with g++ version: 3.2.3 in debug mode

Parameters are:
Debug Level     = 1
Lexicon File    = /usr0/aria/eng2spa/lexicons/simulation-lexicon2.trf
Grammar File    = /usr0/aria/eng2spa/grammars/simulation-grammar.trf
TCTool Log File = /usr0/aria/RuleRefinement/IOFiles/SampleTCToolLogFile-processed
Example number = 2

  * grammar: 
[aria@avenue V0.03]$ ./RuleRefinement.exe  -g /usr0/aria/eng2spa/grammars/simulation-grammar1.trf
Compiled on Feb  8 2006 16:16:07 with g++ version: 3.2.3 in debug mode

Parameters are:
Debug Level     = 1
Lexicon File    = /usr0/aria/eng2spa/lexicons/simulation-lexicon.trf
Grammar File    = /usr0/aria/eng2spa/grammars/simulation-grammar1.trf
TCTool Log File = /usr0/aria/RuleRefinement/IOFiles/SampleTCToolLogFile-processed
Example number = 2

  * Lexicon and Grammar: 
[aria@avenue V0.03]$ ./RuleRefinement.exe -l /usr0/aria/eng2spa/lexicons/simulation-lexicon2.trf -g /usr0/aria/eng2spa/grammars/simulation-grammar1.trf
Compiled on Feb  8 2006 16:16:07 with g++ version: 3.2.3 in debug mode

Parameters are:
Debug Level     = 1
Lexicon File    = /usr0/aria/eng2spa/lexicons/simulation-lexicon2.trf
Grammar File    = /usr0/aria/eng2spa/grammars/simulation-grammar1.trf
TCTool Log File = /usr0/aria/RuleRefinement/IOFiles/SampleTCToolLogFile-processed
Example number = 2

  * !it instantiates it even if the file doesn't exist
[aria@avenue V0.03]$ ./RuleRefinement.exe -l /usr0/aria/eng2spa/lexicons/simulation-lexicon2.trf -g /usr0/aria/eng2spa/grammars/simulation-grammar2.trf
Compiled on Feb  8 2006 16:16:07 with g++ version: 3.2.3 in debug mode

Parameters are:
Debug Level     = 1
Lexicon File    = /usr0/aria/eng2spa/lexicons/simulation-lexicon2.trf
Grammar File    = /usr0/aria/eng2spa/grammars/simulation-grammar2.trf
TCTool Log File = /usr0/aria/RuleRefinement/IOFiles/SampleTCToolLogFile-processed
Example number = 2
...
.In LoadGrammar, GrammarFile is:|/usr0/aria/eng2spa/grammars/simulation-grammar2.trf|
Couldn't open grammar file (/usr0/aria/eng2spa/grammars/simulation-grammar2.trf)


Thursday, February 10, 2006

- Moving lexical examples to CorrectionInstance::LoadTCToolLogFile
0. She read
1. I see the red car
2. I see the red unicorn
3. Mary plays the guitar

- moved action and error info detection from Refiner::DetectAction to
Refiner::InstantiateLogInfo(CorrectionInstance* pCI)

- Refiner::AccessLogInfo -> InstntiateLogInfo


Friday, February 10, 2005

- since now the method CorrectionInstance::LoadTCToolLogFile(char *pLogFileName) 
takes a pointer instead of a string, 
      LogFile.open(LogFileName.c_str());
becomes just:
	LogFile.open(pLogFileName);

since c-str is only used when you have a string, to access the internal 
char * buffer.

- and when trying to append a string with a pointer to char *, I need to 
explicitely turn the string constant ".." into a string object:

  string Command1 = string ("loadgra ") + pGramFileName;
  xfer->processCommand(Command1);

this actually fixed the problem with the translations, since it wasn't 
loading the grammar and lexicon at all and it was taking part of the 
cout and trying to parse it.

 ******************************************************************************
 *** Dilema: experienced C++ programmer vs nice programmer vs no programmer ***
 ******************************************************************************

- met with Jaime, we'll try Bill first and then see... Alon will send him an 
email later tonight. Met with him on Monday, get some feedback by Wednesday.
Then tell Marina (no).

- when I try to make the string data members in Refiner into a string reference
(string & SLSentence) and change GetSLSentence() in CorrectionInstance to
also return a string reference, the compiler complains:
...
RuleRefinement.cpp:106: no matching function for call to `Refiner::Refiner()'
Refiner.hpp:32: candidates are: Refiner::Refiner(const Refiner&)
...
-> leave this for later, or ask Bill.


Saturday, February 11, 2006

- avenue went down (problem with the power supply)

- saved a copy of the old afs backup in temuco:/usr4/aria/RuleRefinement


Sunday, February 12, 2006

- looked at old CI.hpp (from afs backup), logfiles, perl script to postprocess
those files.


Monday, February, 13, 2006

- met with Bill: went over the system and in particular the RR module.
Looked at some log files and the perl script.

- emailed Bill with the most important pieces of information 
stored it into /usr0/aria/RuleRefinement/bill/email-1-intro

- avenue is back up, it was the power supply cable (they replaced it, + fan)

- creating log files for the examples:

  - the parse trace format changed from "" to '' around the lexical items
  - needed to modify postprocess-xfer.out.debug.pl so that it could
  extract the alignment info correctly again:

 ./bin/postprocess-xfer.out.debug.pl corpus/error-typology-simulation.out.debug >! input-tct

- modified intro-test.cgi to load input-tct (instead of input-tct-good)

!!!!!!! Currently intro-test.cgi picks the first 5 tl as stored in %$sl 
(intro(-test).cgi), which means that we need to apply the Spanish LM to 
input-tct before it's given to the TCTool, namely have the LM choose the 
5 best translations form input-tct and pass only those on to the TCTool:

$c = 0;
$sl .= "|";
foreach $tl (keys(%$sl)){
    if ($tl ne "con") {
     if ($c == 5) { last; } # displays 5 first translations only
     $c++;
     $al = $$sl{$tl};
     $tlname = "tl" . "$c"; 
     $tlnameal = $tlname . "-al";

     $html .= "<tr><td><p><font size=\"+1\" face=\"Arial\"><strong>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<input type=\"checkbox\" name=\"$tlname\" value=\"$tl\">$tl</strong>
<input type=\"hidden\" name=\"$tlnameal\" value=\"$al\">
</td></tr>";
 }


For now, I edited the input-tct file and only left the first 5 sentences, so
those are the ones the TCTool will display.
For the first example, however, I removed the correct sentence and included 
the 6th alternative translation, so that the correction makes more sense.
Namley, even though the Xfer engine does produce the correct translation,
the user doesn't see it, and in this case, the user correction can be used
to tighten the grammar by adding an agreement contraint, which causes
the incorrect sentence to not be produced by the system, after refinement.

- finally managed to run all the example sentences thru the tctool, emailed
Bill the link with some explanation (RR/bill/email-2-logfiles)

- updated 00-End2End file in RuleRefinement

- run processTCToolLogFiles.pl on some of the simple log files I just
generated:

bin/processTCToolLogFiles.pl IOFiles/2006-2-13-17-13-55-8336/0

sl = I see the red car
tl = veo el auto roja
al = ((1,1),(2,2),(3,3),(5,4),(4,5))
ctl = 
cal = ((2,(3,(4,(5)
action = edit
Wi = roja
WiC = rojo

still needs debugging: ctl and cal are incorrect!!
rm 2006-2-13-17-13-55-8336/0-processed

- created TestRuleManipulation.cpp to start testing and expanding rule
manipulation operations, using TrRule and other classes (K's)


Tuesday, Febrbuary 14, 2006

- getting linker errors for TrRule when compiling TestRuleManipulation
it seems that even though it's able to create a string, it doesn't find
the string destructor! (~string()). Looks like a template error.

I doublechecked that namespeace std; is present in all the relevant files
(TrRule.hpp, TestRuleManipulation.cpp) and I even add it to TrRule.cpp, just
in case, even though I think having it in TrRule.hpp should be enough)

Made sure that the string format that TrRule::SetTrRuleFromString is expecting
is the one I'm passing it (\n is used as line separator).
However I realized that the format of the TrRule has extra stuff in it
{ADJP,4}
;;SL: B AWRK M@R
;;TL: A METER LONG
;;C-Structure:(<ADJP> (<NP> (DET a-1)(N meter-2))(ADJ long-3))
;;Score:1
ADJP::ADJP [\"B\" \"ARK\" N] -> [\"A\" N \"LONG\"]
(...

-> the problem was that I used different compilers for the different targets.
Once, CPP and once GXX.  GXX leads to older compiler version.  
So, replacing the $GXX in the TestRuleManipulation and TestRM targets did it.


- Need to modify it to just parse the regular rule!
doing that while I don't figure out the linker errors...

modified TrRule::SetTrRuleFromString and TrRule::Print, need to debug, once
I get it to link!

- 4:34pm:  Alon sent email to Bill with access to Avenue project machines

- will need to add a method to access all the constraints between any two
indices:

  // vector<&Constraint*> GetConstraintSet(int Ypos1, int Ypos2);
  // implement a method that given two indexes, it checks all
  // the constraints between them, so that if I want to add a cosntraint
  // between y1 and y2, say, first I make sure such constraint does not exist 

  // for that it would probably be useful to have a matrix 
  // vector<vector<vector<Constraint*>>>
  //  pos1   pos2   set of constraints

- look at how Constraint is implemented so that I can see if it also 
forsees =c and NOT, OR constraints

Here is the Constraint class data members I think I'll need:

int ConstraintType;
int EquationType;
int FeatureType;
ValueTupe Value;
int Pos1; // redundant since I want to store them according to the 2 indices
int Pos2;

where 
ConstraintType = {agr, value}
FeatureType = {tense, ender, feat_1, feat_2...} // problem: this is an infinite list...
// these depend on the Featuretype
ValueType = {sg, pl, masc, fem, past, +, -...}
	  // but also *NOT* and *OR*, not sure how to encode that...
EquationType = { =, =c, ...?}

- Look at UseTrRule to help me test my rule manipulations

- fixed the linker problem (see above), needed to change instanced of GXX 
into CPP in Makefile


Wednesday, February 15, 2006

- Need to give each CI an id, if the log file has a unique id, then use that,
otherwise create it, so that I can track refinements to the CI that originated
them and also to be able to count how many users made the same change (which
should lead to the same refinement).

(tell Bill)
Different ways to parse several sentences from the same user:

1. First, append them all to the first log file (-all), so that all the 
corrections one user made are in the same file, and then store each CI 
separately, or

2. parse each file at a time (0, 1, 2,...) and store it into a different CI 

Either way, I need to have a way to know which corrections were made by the 
same user and I need to have a way to know which corrections are about 
the same sentence pair and whether they are the same corrections or different,
for a given SL-TL sentence pair.

-> Need to index CIs by SL-TL sentences (just SLS is not enough, user could
have picked a different translation to correct). And then when there is a 
log file that given the same SL-TL sentence, it has the same corrections,
increase the user_counter in that CI, instead of creating a new CI. 

- We could also store the unique user IDs that made that correction in the
relevant CI instance... vector<string>?
Note: log files from user studies already have a unique ID which can be 
extracted, but for log files generated by correcting Mapu MT output, a 
unique ID needs to be generated (2005-11-18-16:50:38-4983/log)

-> Refinements should be labeled by SL-TL pairs that lead to that refinement,
with the user_counter information as well, so that it's easy to see how much 
support there was for any given refinement.


-> Store a collection of processed CIs and have a bool for whether it lead to 
a refinement or not (if not, we assume there was lack of support for that CI).
If it actually lead to a refinement and it decreased the eval accuracy, that
should be stored elsewhere (bool bad_instance, 0 by default, 1 if it decreases
accuracy)

(for me)
-> In batch mode, accumulate CIs and counts for the CIs first, and then 
move on to the rule refinement process for those instances with 90% support, 
say

(send this info in an email bill/email-3-CI)


- Modified TrRule and tried to compile it by itself, getting similar errors 
than yesterday!!! arghhhhhhh no idea why
TestRM compiles and links fine and directly depends on TrRule!

ok, I need to say make TrRule.o and not make TrRule!!!
since there is no target called TrRule and so make probably will use some 
default rules to create a target.

- don't know why, but K's original TrRule file (TrRule-K.cpp) is not even 
compiling!! getting tones of errors :-/


Anyway...

- met with Alon (Jaime wasn't there)

- added CI processing issues to Thesis-updated.tex (**** in tex, red font in pdf)

- Rule string format needs to be as follows:
"{NP,8}\nNP::NP : [DET ADJ N] -> [DET N ADJ]\n(\n(X1::Y1)\n(X2::Y3)\n(X3::Y2)\n((x0 det) = x1)\n((x0 mod) = x2)\n(x0 = x3)\n(y0 = x0)\n(y1 == (y0 det))\n(y3 == (y0 mod))\n(y2 = y0))\n";

namely:

{NP,8}
NP::NP : [DET ADJ N] -> [DET N ADJ]
(
(X1::Y1)
(X2::Y3)
(X3::Y2)
((x0 det) = x1)
((x0 mod) = x2)
(x0 = x3)
(y0 = x0)
(y1 == (y0 det))
(y3 == (y0 mod))
(y2 = y0))

otherwise, TrRule::SetTrRuleFromString is not able to create a rule

- commented out Print() in TrRule::SetConstraint(string ConstraintPassIn) so
that the rule building process is not printed out at each step.

-  // need to also keep comments, so if it finds comments preceeding a rule
  // those are also considered part of the rule as it were 
  // refined rules will have SL-TL info + user_counter info

  // however, original rules, will most likely not have this information
  // so, need to include it when it's there and not crash when it's not there

- added ";; SL: I see the red car\n;; TL: veo el auto rojo\n;; Users = 2\n" 
to the beginning of the Rule string

testing... weird, it complains when getting the constraints!

[aria@avenue V0.03]$ ./TestRuleManipulation.exe
----------- Now printing Rule 1 (R1) ------------------
NP::NP [DET ADJ N] -> [DET N ADJ]
(
(X1::Y1)
(X2::Y3)
(X3::Y2)
ERROR in TrRule::GetConstraintsInPrintOrder. Can't find agreed index for constraint (Y0 = )


- modified code to find comments, now I just need to store them in the
right way so that everything else doesn't get screwed up!


[aria@avenue V0.03]$ ./TestRuleManipulation.exe
The following comment was found:
;; SL: I see the red car
The following comment was found:
;; TL: veo el auto rojo
The following comment was found:
;; Users = 2
----------- Now printing Rule 1 (R1) ------------------
NP::NP [DET ADJ N] -> [DET N ADJ]
(
(X1::Y1)
(X2::Y3)
(X3::Y2)
ERROR in TrRule::GetConstraintsInPrintOrder. Can't find agreed index for constraint (Y0 = )


Thursday, February 16, 2006

- the reason it was crashing was that the way TrRule::SetTrRuleFromString
detects constaints is by loking for an = sign in the line... and one of 
my comments had an equal sign...

     if ((Lines[i].length() != 0)&&(Lines[i].find('=') != -1)){
	//then it's gotta a be a constraint or else some sort of noise
	if (GlobalParams::DebuggingLevel == 1){
	  //  cout << "Now setting constraint from string:" << Lines[i] << endl;
	}
	SetConstraint(Lines[i]);
      }

-> changed it: ;; Users: 2\n
and it's working well now

- RankInCategory?

From Kathrin:
About the RankInCategory -- I don't think I ended up using it much, because it
 wasn't relevant to my code.  But it had something to do with what order the 
constraints need to be in.  You know how in Unificiation it matters what order
 they're in?  I dont remember the details but at some point I wanted to be 
able to set that somehow.  I guess -- let's see.  If you have two constraints 
in the same category, like(hypothetically) 

((X2 case) = acc)
(X0 = X2)
((X0 case) = nom)

then (X0 = X2) and ((X0 case) = nom) are in the same category (...right...? according to Erik's definition of categories...?) but you need to make sure that (X0 = X2) is set before ((X0 case) = nom), because the other way around would result in something different. 

My constraints never got complex enough for this to matter, but it might matter in your case ...

-> since I will only add value constraints and agreement constraints with a
specific feature, I don't think this will matter, but need to make sure at a later
stage

- looked at how to get the constraints I want,
still figuring out exactly how the methods work, categories seem to be
shifted...


Friday, February 17, 2006

- met with Bill to talk about the CI class and to clarify what it needs to do
-> history of alignments doesn't need to be stored explicitely 
in CI, but the indices of current words, do change if a word is added, say.

- send him an email with a summary and new info which I forgot to tell him: 
email-3-CI


Monday, February 20, 2006

- categories as specified by K's comments don't match her methods.
From TestRuleManipulation.cpp:

  //Constraint.hpp://1=parsing,2=transfer,3=generation,4=featurefilling/constrchecking
  // see concrete examples from R1 modified below

  map<string, pair<string,string> > TrConstraint;
  cout << "---------- 3) Getting All Constraints of category 4\n"; 
  //vector<string> GetConstraints(int CategoryPassIn); //for a certain type (category) only
  vector<string> VS = pTR->GetConstraints(4);
  for (int i = 0 ; i < VS.size() ; i++) {
    cout << VS[i] << endl;
  }

  // K's categories are not quite right...

  //  GetConstraints(1); ouput: ok
  /* ((X0 DET) = X1)
     ((X0 MOD) = X2)
     (X0 = X3)
     ((X1 NUM) = (X3 NUM)) */
  //  GetConstraints(2); no ouput -> this should output (Y0 = X0) 
  //  GetConstraints(3); output:(Y0 = X0)  (transfer constraint, should be of category 2...)
  // this should output:
  /* (Y1 == (Y0 DET)
     (Y2 == (Y0 DET)
     (Y2 = Y0)
     (Y3 == (Y0 MOD) */

  //  GetConstraints(4); output: 
  /* ((Y1 NUM) = SG)
     (Y1 = (Y0 DET)
     (Y2 = (Y0 DET)
     (Y2 = Y0)
     ((Y3 AGRGEN) = (Y2 AGRGEN))
     (Y3 = (Y0 MOD) */
  // but should really be just: ((Y3 AGRGEN) = (Y2 AGRGEN))

  // not sure it's worth fixing, does this matter to the RR module?
  // actually, it is good to have all the y-side constraints to be grouped under one category, ie. 4

  /* so if I assume that:
     1 is parsing (as intended)
     3 is transfer
     4 is generation (y-side)
     and 2 I ignore, I should be ok...
  */
  cout << "---------- End of Getting All Constraints of category 4\n"; 


- finally I figured out how to pass it the rhs index, to create an agreement constraint:

// 3rd SetConstraint method:
  void SetConstraint(int ConstrainedIndexPassIn, string FeaturePassIn, string FeatValuePassIn, string FeatAgrPassIn, int RankInCategoryPassIn, int CategoryPassIn);


  //  string NewConstraint = "((y3 agr gen) = (y2 agr gen))"; 
   pTR->SetConstraint(3, "agr gen", "empty", "Y2", -1, 4);

output:

NP::NP [DET ADJ N] -> [DET N ADJ]
(
(X1::Y1)
(X2::Y3)
(X3::Y2)
((X0 DET) = X1)
((X0 MOD) = X2)
((Y1 NUM) = SG)
(X0 = X3)
((X1 NUM) = (X3 NUM))
(X1 = Y1)
(Y0 = X0)
(Y1 = (Y0 DET)
(Y2 = (Y0 DET)
(Y2 = Y0)
((Y3 agr gen) = Y2 agr gen))
(Y3 = (Y0 MOD)
)

- testing all the different SetConstraint methods on different constraint 
types, it's a mess, very inconsistent!


in Constraint.cpp:
string Constraint::ReturnAsString() const{
  
  string ReturnString;
  
  //convert the index to a string
  char buf[10];
  sprintf(buf,"%d",ConstrainedIndex);
  string IndexStr = buf;
  
  if (ValueConstraint == true){
    if (Category == 1){
      ReturnString = "((X";
    }
    else{
      ReturnString = "((Y";
    }
    //then add the Feature, Index and the FeatValue
    ReturnString = ReturnString + IndexStr + " " + Feature + ") = " + FeatValue + ")";
  }
  else{
    //check if it's a simple agreement constraint
    if (Feature == "all"){
      if (Category == 1){
	ReturnString = "(X";
      }
      else{
	ReturnString = "(Y";
      }
      ReturnString = ReturnString + IndexStr + " = " + FeatValue + ")";
    }
    else{
      if (Category == 1){
	ReturnString = "((X";
      }
      else{
	ReturnString = "((Y";
      }
      ReturnString = ReturnString + IndexStr + " " + Feature + ") = " + FeatValue + " " + Feature + "))";
    }
  }
  //cout << "Returning from constraint:" << ReturnString << endl;
  return ReturnString;

}


so, that's where the (( is coming from
and the missing paren!

and should give a try to "all" instead of "empty" and ""
  else{
    //check if it's a simple agreement constraint
    if (Feature == "all"){


Tuesday, February 21, 2006

- testing GetConstraints methods

Let's say, we wanted to eliminate all the constraints pertaining to X1 and Y1 
(because we were getting rid of X1 and Y1) [note: that then the remaining 
constraints' indices would need to be changed accordingly!]

	target:
	((Y1 num) = sg)
	((X1 num) = (X3 num))
	(Y1 = X1)
	(Y1 = (Y0 DET)
	

	CategorySet.insert(1);
	CategorySet.insert(4);
	TrConstraint = pTR->GetConstraintsForInd(1,CategorySet);
	output:
	string (1arg in map): all
	first of pair: empty; second of pair: (Y0 DET
	string (1arg in map): num
	first of pair: empty; second of pair: (X3


	CategorySet.insert(3);
	CategorySet.insert(4);
	TrConstraint = pTR->GetConstraintsForInd(1,CategorySet);
	output:
	string (1arg in map): all
	first of pair: empty; second of pair: X1
	string (1arg in map): num
	first of pair: sg; second of pair: empty
	

	CategorySet.insert(1);
	CategorySet.insert(3);
	CategorySet.insert(4);
	string (1arg in map): all
	first of pair: empty; second of pair: X1
	string (1arg in map): num
	first of pair: empty; second of pair: (X3
	
	this only retrieves:
	(Y1 = X1)
	((X1 num) = (X3 num))
	
	!!! not sure why this map is an incomplete set of all the constraints 
	of index 1 and of Category 1, 3 and 4!!
	Assing Category 2 doesn't make any difference...

	In any case, I can't trust these method to retireve all the 
	constraints for an index

- testing EraseConstraint...

  // EraseConstraint manipulates: map<string,Constraint> but it looks like it only has been implemented for
  // simple value constraints

- finished testing K's methods

- writting ConstraintClassSpecs.txt to figure out what I really need and what 
would be an overkill

- fixing rule format in  TrRule::SetTrRuleFromString
Need to extract RuleID, however, only RuleIndex can be stored...
Added string POS as a data member and added code both to SetTrRuleFromString
and Print() methods


Thursday, February 23, 2006

- finished adding code to store and print out RuleID

- Need to fix bugs for when a rule is set from a String 
(pTR->SetTrRuleFromString(R1)), it looks like the SetConstraint method needs
to be implemented more robustly so that the constraints have the right number
of parents -> RuleChecker!

// missing parens! -> stderr << illformed constraint or something like that
// looks like this might be coming from the SetConstraint class

(Y1 = (Y0 DET)
(Y2 = Y0)
((Y3 AGRGEN) = (Y2 AGRGEN))
(Y3 = (Y0 MOD)

- look at my old prototype code to see if I'm forgetting any actions for the
Constraint class


Friday, February 24, 2006

- working on Constraint class and methods that modify rules

- Bill: he implemented the CI class to read simple log files, and is taking
the alignment info from the header. But since log files will not always have
headers, he needs to extract it from the parse tree.

I asked him to also write a test class, as a way to debug his class and have
example usage of hoe to call his methods.

He's going to send me his code with a test class before 5pm.

!!! Realized there isn't a real nice way to extract clue words from 
full version of TCTool...:

[aria@avenue IOFiles]$ grep -r "Reason" *
2004-2-21-10:58:29-4283/4:* Reason: none-of-the-above(arriba)
2004-2-21-10:58:29-4283/4:* Reason: none-of-the-above(almenys en castellà
2004-2-21-10:58:29-4283/5:* Reason: none-of-the-above(same argument structure, b
ut
2004-2-21-10:58:29-4283/6:* Reason: none-of-the-above(acc. pronoun)
2004-2-21-10:58:29-4283/7:* Reason: wrong-gender()
2004-2-21-10:58:29-4283/7:* Reason: none-of-the-above(gen.-pronoun)
2004-2-21-10:58:29-4283/8:* Reason: none-of-the-above(acc.- pronoun)
2004-2-21-10:58:29-4283/9:* Reason: wrong-gender()
2004-2-21-10:58:29-4283/10:* Reason: wrong-person(subject)
2004-2-21-10:58:29-4283/10:* Reason: none-of-the-above(plural commitative)
2004-2-21-10:58:29-4283/13:* Reason: different-sense
2004-2-21-10:58:29-4283/15:* Reason: none-of-the-above(completiva d'infinitiu)
2004-2-21-10:58:29-4283/16:* Reason: none-of-the-above(jugar needs prep.)
2004-2-21-10:58:29-4283/17:* Reason: different-sense
2004-2-21-10:58:29-4283/18:* Reason: wrong-tense
2004-2-21-10:58:29-4283/18:* Reason: wrong-number()
2004-2-21-10:58:29-4283/21:* Reason: wrong-tense
2004-2-21-10:58:29-4283/21:* Reason: wrong-person(subject (él))
2004-2-21-10:58:29-4283/23:* Reason: wrong-gender()
2004-2-21-10:58:29-4283/23:* Reason: different-sense
2004-2-21-10:58:29-4283/24:* Reason: didnt-translate-word()
2004-2-21-10:58:29-4283/25:* Reason: wrong-tense
2004-2-21-10:58:29-4283/25:* Reason: wrong-form
2004-2-21-10:58:29-4283/26:* Reason: wrong-person()
2004-2-21-10:58:29-4283/27:* Reason: wrong-person(subject (cat))
2004-2-21-10:58:29-4283/28:* Reason: different-sense
2004-2-21-10:58:29-4283/28:* Reason: didnt-translate-word()
2004-2-21-10:58:29-4283/28:* Reason: didnt-translate-word()
2004-2-21-10:58:29-4283/29:* Reason: wrong-gender()
2004-2-21-10:58:29-4283/29:* Reason: none-of-the-above(don't translate)
2004-2-21-10:58:29-4283/29:* Reason: wrong-gender()
2004-2-21-10:58:29-4283/30:* Reason: wrong-tense
2004-2-21-10:58:29-4283/31:* Reason: none-of-the-above(don't translate)
2004-2-21-10:58:29-4283/31:* Reason: different-sense
2004-2-23-08:02:57-7966/4:* Reason: different-sense
2004-2-23-08:02:57-7966/4:* Reason: wrong-gender()
2004-2-23-08:02:57-7966/4:* Reason: wrong-number()
2004-2-23-08:02:57-7966/5:* Reason: wrong-person()
2004-2-23-08:02:57-7966/5:* Reason: didnt-translate-word()
2004-2-23-08:02:57-7966/6:* Reason: wrong-form
2004-2-23-08:02:57-7966/7:* Reason: wrong-form
2004-2-23-08:02:57-7966/8:* Reason: wrong-form
2004-2-23-08:02:57-7966/9:* Reason: wrong-gender()
2004-2-23-08:02:57-7966/10:* Reason: wrong-form
2004-2-23-08:02:57-7966/10:* Reason: wrong-number(yo)
2004-2-23-08:02:57-7966/17:* Reason: didnt-translate-word()
2004-2-23-08:02:57-7966/18:* Reason: wrong-tense
2004-2-23-08:02:57-7966/18:* Reason: wrong-number(puentes)
2004-2-23-08:02:57-7966/21:* Reason: wrong-number(el)
2004-2-23-08:02:57-7966/21:* Reason: wrong-number(el)
2004-2-23-08:02:57-7966/21:* Reason: wrong-number(é)
2004-2-23-08:02:57-7966/21:* Reason: wrong-number(él)
2004-2-23-08:02:57-7966/23:* Reason: wrong-gender(chica)
2004-2-23-08:02:57-7966/23:* Reason: didnt-translate-word()
2004-2-23-08:02:57-7966/23:* Reason: different-sense
2004-2-23-08:02:57-7966/23:* Reason: wrong-form
2004-2-23-08:02:57-7966/24:* Reason: wrong-form
2004-2-23-08:02:57-7966/24:* Reason: wrong-form
2004-2-23-08:02:57-7966/25:* Reason: wrong-tense
2004-2-23-08:02:57-7966/26:* Reason: wrong-person(madre)
2004-2-23-08:02:57-7966/26:* Reason: wrong-form
2004-2-23-08:02:57-7966/27:* Reason: wrong-number(gato)
2004-2-23-08:02:57-7966/27:* Reason: incorrect-word()
2004-2-23-08:02:57-7966/28:* Reason: incorrect-word()
2004-2-23-08:02:57-7966/29:* Reason: wrong-gender(pluma)
2004-2-23-08:02:57-7966/29:* Reason: wrong-gender(pluma)
2004-2-23-08:02:57-7966/30:* Reason: incorrect-word()
2004-2-23-08:02:57-7966/31:* Reason: wrong-tense
2004-2-23-08:02:57-7966/31:* Reason: different-sense
2004-2-24-14:20:58-19501/4:* Reason: wrong-gender("chairs" (sillas, fem))
2004-2-24-14:20:58-19501/4:* Reason: wrong-number("chairs" (sillas, pl.))
2004-2-24-14:20:58-19501/5:* Reason: wrong-form
2004-2-24-14:20:58-19501/5:* Reason: wrong-person("you" (tu, 2nd))
2004-2-24-14:20:58-19501/6:* Reason: wrong-form
2004-2-24-14:20:58-19501/7:* Reason: wrong-gender(context given)
2004-2-24-14:20:58-19501/7:* Reason: wrong-gender(context given)
2004-2-24-14:20:58-19501/7:* Reason: wrong-form
2004-2-24-14:20:58-19501/8:* Reason: wrong-form
2004-2-24-14:20:58-19501/9:* Reason: wrong-gender(context given)
2004-2-24-14:20:58-19501/10:* Reason: wrong-person("I" (yo, 1st))
2004-2-24-14:20:58-19501/10:* Reason: wrong-number("I" (yo, sing.))
2004-2-24-14:20:58-19501/10:* Reason: wrong-form
2004-2-24-14:20:58-19501/15:* Reason: incorrect-word()
2004-2-24-14:20:58-19501/17:* Reason: didnt-translate-word()
2004-2-24-14:20:58-19501/18:* Reason: incorrect-word()
2004-2-24-14:20:58-19501/18:* Reason: incorrect-word()
2004-2-24-14:20:58-19501/18:* Reason: wrong-number(bridges ("puentes"))
2004-2-24-14:20:58-19501/19:* Reason: wrong-form
2004-2-24-14:20:58-19501/19:* Reason: wrong-form
2004-2-24-14:20:58-19501/21:* Reason: wrong-number("he" (él))
2004-2-24-14:20:58-19501/23:* Reason: didnt-translate-word()
2004-2-24-14:20:58-19501/23:* Reason: wrong-gender(girl ("chica", fem))
2004-2-24-14:20:58-19501/23:* Reason: incorrect-word()
2004-2-24-14:20:58-19501/23:* Reason: wrong-form
2004-2-24-14:20:58-19501/24:* Reason: wrong-form
2004-2-24-14:20:58-19501/24:* Reason: wrong-form
2004-2-24-14:20:58-19501/25:* Reason: wrong-form
2004-2-24-14:20:58-19501/25:* Reason: wrong-tense
2004-2-24-14:20:58-19501/25:* Reason: wrong-form
2004-2-24-14:20:58-19501/26:* Reason: wrong-person(us ("nos"))
2004-2-24-14:20:58-19501/27:* Reason: incorrect-word()
2004-2-24-14:20:58-19501/27:* Reason: wrong-number(cat ("gato", sing))
2004-2-24-14:20:58-19501/29:* Reason: wrong-gender(feather ("pluma", fem))
2004-2-24-14:20:58-19501/29:* Reason: wrong-gender(feather ("pluma"))
2004-2-24-14:20:58-19501/30:* Reason: wrong-tense
2004-2-24-14:20:58-19501/30:* Reason: incorrect-word()
2004-2-24-14:20:58-19501/31:* Reason: incorrect-word()

Once the CI class is working, need to expand it to extract clue word from
complex LogFiles. Algorithm: 
Look into () after *Reason:
- if it contains one word and this word corresponds to a word in the TLS, 
we have a candidate

- if it contains multiple words and one of them corresponds to a word in the 
TLS, we have a likely candidate (indicate degree of uncertainty somehow)

Note words can be in quotes or not!

- either way, store the complete reason string so that I can always refer back to it (Reason="wrong-number(cat ("gato", sing))").
****************************************************************

If alignment info is not right, suggest that each word contains its alignment 
information as well. SLWord might contain two alignments one to TL and one to CTL, whereas TL and CTL words will only contain alignments to SLWords.

- Updating file ManipulatingRules.txt with more detailed info for params 
required by each method 

- copied TrRule into Rule class, need to implement my own Rule class with only
the methods I want.


Sunday, February 26, 2006

!!! No need of perl script to pre-process the LogFiles anymore !!!

- Bill sent his test code (TestCI.cpp) his main only calls the Load method, 
but he included a print method which exercises the different CI methods this 
time. 

#############################################################################
1.Bug: the new word is added in an incorrect position
Cause: when a word is added, the position count starts at 0 insead of 1, 
       and all the other positions start at 1.
Log files affected 4, 5 and 7


2.Bug: when reading in Log File 7, right after "Loading: 7", there seems to be
 a loop for the SL sentence vector that prints it as it traverses it.


1.Improvement: After editing a word, if there is an spurious move action 
(the words do not change), detect and discard action. 
Add a comment to the code that does this, so that it can easily be commented 
out later if the need arises. 
Example: last action of Log file 9
You seem to be doing this right already for add and delete actions which are
followed by a spurious "word has been moved" statement.

Task: Do big 0 analysis for the loadCI method.


Issue: when a word is edited and then the word order affecting that word
changes, my framework currently assumes that this is actually part of 
the same correction (the word needs to be moved becuase it has a different form). 
Unfortunately, the TCTool does not register which word was moved where, and
it would not really matter, unless one of the two words involved in the order
changes was edited immediatelly after or before the order change.

This becomes more of a problem when the other word involved in the order change
is actually also edited for a different reason further down in the log file, 
which is precisely the case for log file 8. (In log file 9 this is done
correctly, but I'm guessing that is just by chance).

However, if we used a greedy algorithm to detect such cases and pick the word
recently edited (or about to be edited) as the word which has also been moved,
when the move is local (namely happens between contiguous words), then my 
framework will actually work much more nicely.

Otherwise, instead of just having one error word Wi for two related errors, 
which are actually affecting the same word, we would end up with two error
words and would not be able to capture that the move is related the the edit.

Since this affects looking at two different actions, this probably would need 
to be done in the code that deals with all the actions, namely my code.
However, your CI interface does not allow me to reset the position of the error
word in the C/TLS. Could you add this in? Or can you think of a better way
to deal with this special case?


Question: what does sz stand for in your code?


For Next iteration:

- make sure it also parses Log Files egenrated by the full-fledged version of 
the TCTool:

  - Add clue word info

  - Add confidence level info (*Desirable vs *Necessary -> need to check
JavaScript implementation to see what other words are stored to indicate
confidence level)-> need to debug first, when I try to edit the word, an error
occurs!!

- make sure the CI class is robust. For example, make sure the user can add 
multiple-word entries and that the CI class will store that correctly as 
the new word added.
##############################################################################

- sent Bill an email (bill/info/email-7-CIFeedback)

- copied Bill's files into V0.03 dir


Tuesday, February 28, 2006

- 9am: met with Jaime 

4 Bill:

Once basic CI class is working for all Log Files...

Here is the more researchy aspect of manipulating CIs:

Need to detect and cancel (not store) empty/spurious loops.
Example:
	For the following sequence of correction actions:

	       sl: The great artist
	       El artista gran
	       El artista grande (edit: gran->grande)
	       El grande artista (cwo)
	       El gran artista (edit: grande->gran)

	Only the following should be stored in CI's vActions:
      
		El artista gran 
		El gran artista (cwo)

Note that these are not necessarily "state repeats" (AI), in this case
for example, it doesn't go back to the initial state, since the order of
artista and gran is changed in between spurious corrections.

Another example, which seems too terrible to naturally occur, but which I have actually seen users do, goes as follows:

	For the following sequence of correction actions:
		sl: I saw the girl
		Vio la muchacha
		Vio a la muchacha (add a word: 'a')
		add alignment 'a' with 'saw'
		delete alignment from 'a' to 'saw'
		add alignment from 'a' to 'woman'
		delete alignment from 'a' to 'woman'
		Vio la muchacha (delete a word: 'a')
		Vi la muchacha (edit: vio->vi)
		Vi a la muchacha (add a word: 'a')  
		add alignment 'a' with 'saw'
		delete alignment from 'a' to 'saw'
		add alignment from 'a' to 'the'
		delete alignment from 'a' to 'the'

	Only the following should be stored in CI's vActions:
            
		sl: I saw the girl
		Vio la muchacha
		Vi la muchacha (edit: vio->vi)
		Vi a la muchacha (add a word: 'a')  

Once this is in place, we will be in a situation where we can implement a
comparison method, that given two different CIs for the same SL-TL-CTL 
tripplet, it detects if the non-spurious actions are the same, namely
if two CI's are equivalent, even though their LogFile is not identical.

If the order in which the correction actions took place is different, 
this should also be considered somewhat equivalent... (even though this 
might not actually be true for dependent errors... since taking into 
consideration one correction first might result into different refinements).

Maybe instead of having the compare method return a boolean, it could return
an int, where 0 means no equivalence, 1 means exact equivalence (without 
counting spurious loops) and 2 means equivalence in terms of correction actions
used, but not in their sequence.


- Once the comparison method is in place, implement a collection of CIs, 
indexed by SL-TL-CTL, each CI should contain a vector of unique IDs, initially
only containing one ID (extracted from Log File or the directory name that
generated it).
Should be able to access all CI's that have the same SL-TL sequence (need at
least two access methods, once given SL-TL-CTL and one given SL-TL).

- For each SL-TL, group all CIs affecting a particular SL-TL-CTL tripplet together.
When storing CI's affecting the same SL-TL-CTL, if it turns out that two 
correction instances are equivalent (see above), then we only want to store
it once, say we decide to store CI1, and then add the unique ID from CI2
into the vectorID, so that it now contains the ID for CI1 and the ID for
CI2.

So if we do this recursively, at the end, only unique CIs should be stored
under each different SL-TL-CTL tripplet, and each CI will store a vectorID
with one or more unique IDs.

- Once all this is in place, add an evidence method, which will return the 
size of the vector, given a CI. This tells us how many actual LogFiles support
the evidence for that CI.

- Another method that should be implemented is given a collection of grouped
CIs, return the CI with more evidence (namely, larger size of vectorID). We 
could call this GetBestCI or something like that.

- The next step is to store a new collection containing the CIs with higher 
evidence. After processing all the CIs, this collection will contain only one 
CI per SL-TL pair, namely the one with higher evidence (BestCI).

- Finally, we will need a ranking method, which given this new collection, it
ranks BestCIs by error complexity, from easier to refine to harder to refine. 
A first approxiamtion of error complexity could be the number of corrections, 
which sort of correlates to number of errors.
This means that the method needs to compare each BestCI's Actions vector and
the one with a smaller vector, should be ranked higher.

There is one caveat here, the error complexity comparison method should not
take into account Alignment corrections. This is not to say that alignment 
corrections are not relevant for furhter processing, but we don't think they
should contribute the same as other actions to error complexity, and so the 
easiest way to do this, is by not counting them at all in this method.

- But there is something else that can be done in terms of approximating
a true error complexity, when there are multiple errors, namely to try to
detect if the errors are independent or not.
The reason for this being that when errors are dependent, it becomes trickier 
to decide which refinement operation to apply first, and thus such cases 
should be ranked lower by the ranking method, and thus should have a higher
complexity score.

Now, this is definitely a research topic per se and a very tricky thing to 
check, since to be really sure, one would want to track the error all the 
way down to the rules, and see if different corrections affect the same rules.
However, on a superficial level, we can implement an reasonable approximation 
in the right direction. 
For example, if a CI contains multiple errors, all affecting different words,
then we make the assumption that they are independent errors. 
If, on the other hand, two or more different corrections affect the same word,
then we can make the assumption that they are dependent. 
For the following example: 

    sl: Gaudi was a great artist
    tl: Gaudi es un artista grande
    
    Correction 1:
    edit: grande->gran
    temp_ctl: Gaudi es un artista gran

    Correction 2:
    cwo: grande artista

    ctl: Gaudi es un gran artista

the new error complexity comprison method, would detect that there were 
2 actions involved and that they were not independent.

It is not immediately clear how to quantify dependencies, but let's say that
for now, when dependency is detected between two or more actions, the 
error complexity gets incremented by 1. In this case, CIs with two dependent
actions would get a score of 3 and would be considered as complex as CIs with 
three independent actions.

This is not ideal, so if you can think of a better scoring mechanism, that 
would be grant. 

The ideal final ranking of CIs would be as follows:

    1st: CIs with one correction action
    2nd: CIs with two independent correction actions
    3rd: CIs with two dependent correction actions
    4th: CIs with three independent correction actions
    5th: CIs with three dependent correction actions
    etc...

So maybe storing the information about error dependency separately from 
the actual number of corrections is what is required here.


- And as we were talking about two weeks ago, CIs should also have a couple
of book-keeping variables to store:
   1. whether they lead to an actual change in the system, and
   2. whether that refinement(s) increased or decreased the accuracy over
   a regression test set.

- Another thing we are going to want to store later on in the refinement
process is what rules and lexical entries would be affected by the refinement 
operations triggered by a specific CI action. 
So even though the initial CIs will have no information about that, I need
to have a way to store Rule ID for each CI's action (storeRuleID).
Note that the grammar rule and lexical entry IDs are of the same format, 
so just one data structure is needed. 

This will allow us to calculate real dependencies later in the process, namely
I will be able to tell if two different CIs will end up affecting the same
rules and lexical entries or not. 

So having a method to tell us whether two different RuleID vectors contain the
same elements (regardless of order), is what we need here. 
I'm envisioning something like this: 
    vector<RuleIDs> DetectSameRuleID (CI1, CI2)

where it loops over each Actions vector and if it detects the same RuleID, in
one of the actions in the other CI, it stores it in the out argument, which
is then returned at the end, so that we can tell which Rules are affected why
both CIs.

     
4 me: 

In batch mode, and when more than one user have corrected the same sentences
(user study), group all CI affecting one particular SL-TL pair together

- Test Collection of CIs, access them by SL-TL and then by SL-TL-CTL

- check how many log files supported any given CI.

- exercise ranking methods and other methods I asked Bill to implement
-> need to generate logfiles with multiple independent errors 
and multiple dependent errors, etc.

2 indep
2 depen
3 indep
3 depen

Now as for the time sequence processing problem, Jaime suggested that each
CI also stores what rules and lexical entries are affected by the refinement(s)
triggered by that CI. 
Ask Bill to implement those methods for each CI *ACTION* (different actions
affect different rules!) and test: StoreRuleID(CI), vector<RuleIDs> DetectSameRuleID (CI1, CI2).


So that in the future, we can calculate rule dependencies in the following way:

Epsilon? (change_1 ->(followed by) change_2) =? Epsilon? (change_2 -> change_1) 

given rule r, if we apply change_1 first then we get r1'
		    and then change_2, we get r1''

	      if we apply change_2 first then we get r2'
		    and then change_1, we get r2''

the question is if r1'' and r2'' are equivalent.

This actually is more complex, since we need to take into account lexical 
entries and possibly more than one rule as well.

-> interesting research problem: are two grammars with their respective 
lexical entries equivalent? -> Alon


If r1'' and r2'' are not equivalent, then we can do:
   - regression testing (batch mode)
   - active learning (interactive system)


- backed up RuleRefinement dir to Avenue afs and temuco (/usr4)


Friday, March 3, 2006

- merging to-do's for Bill -> CorrectionInstanceClassExpansion.doc

- met with Bill and went over the ideas behind CorrectionInstanceClassExpansion
(he was sick of dealing with LogFiles, reasons (clue words) are nasty!
    We'll meet Tuesdays or Thursdays from now on

- worked on short paper for hlt-naacl PhD consortium


Tuesday, March 7, 2006

- Moved on to next version: 
mkdir V0.04
[aria@avenue RuleRefinement]$ cp -r V0.03 V0.04

- incorporating Bill's CI methods to my code (RuleRefinement.cpp)

- copied Bill's version of ParseTree which he expanded.

- met with Bill

- instead of having to have CI as a data member, Bill suggested that 
I could also implement a mehtod in the Refiner class that returns a constant 
pointer to the right CI:
  const CI* Refiner::UseCI();
  ---
  pRefiner->UseCI();


Wednesday, March 8, 2006

- met with Jaime (see research-diary.txt)

- emailed Bill to include correct CI information so that RR module can access
that info and use to decide REFINE vs BIFURCATE as well as for regression
testing


Thursday, March 9, 2006

- looking at rules and lexical entries

- constraints!!!


Friday, March 10, 2006 (Mariona = 30)

- finished writting up rule and lexical entries document 
(GrammarRulesLexicalEntries.txt) and sent Bill an email in case he gets it 
before going to Acapulco for Spring break (email-11-RulesLexicalEntries).

- storing further comments to send to Bill in 
/usr0/aria/RuleRefinement/info/4Bill, so that I can
send them all in one single email at the end of next week


Sunday, March 19, 2006

- continue integrating bill's CI code: debugging  InstantiateLogInfo

[aria@avenue V0.04]$ make Refiner
/usr/local/bin/g++ -g  -ftemplate-depth-24 -Wno-non-virtual-dtor -DNDEBUG
-o Refiner.o -c Refiner.cpp
-I/temuco/usr5/shared/code/antlr/antlr-2.7.1/lib/cpp
-I/afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/stable-linux2
-I/shared/Genkit/UKernel -I/shared/Genkit/Toolbox
Refiner.cpp: In member function `void
   Refiner::InstantiateLogInfo(CorrectionInstance*)':
Refiner.cpp:132: duplicate case value
Refiner.cpp:124: previously used here
make: *** [Refiner.o] Error 1


l. 124:     case CHANGE_WORD_ORDER:
              {
                ActionChangeWordOrder *pChangeOrder = (ActionChangeWordOrder*)pAct;
                cout << "Action : Change word order, move : "
                     << pChangeOrder->GetOldPos() << " to : "
                                                 << pChangeOrder->GetNewPos() << endl;
                break;
              }
l. 132:            case ADD_ALIGNMENT:
              {
                ActionAddAlignment *pAddAlignment = (ActionAddAlignment*)pAct;
                cout << "Action : Add alignment : "
                     << pAddAlignment->GetNewAlign() << " to word at : "
                     << pAddAlignment->GetSLWordPos() << endl;
                break;
              }

But the TestCI code seems to have compiled fine...

- testing Bill's CICollection code:

expanded testcode to print from Collection accessing methods first

2 create object file
 /usr/local/bin/g++ -o TestCICollection.o -c TestCICollection.cpp

2 create executable, need to give all the source (object) files!!!
/usr/local/bin/g++ -o TestCI TestCICollection.o CICollection.o CorrectionInstance.o ParseTree.o RefCountedObject.o

2 run the executable:
./TestCI

- emailed him my feedback and comments Re: Example/Test code


March 20, 2006

- debugging ErrorComplexity scores
(see LogFiles and complexity criteria I gave bill, and make sure it's right)

running /usr0/aria/RuleRefinement/V0.04/CICollection/TestCI

found some bugs, emailed Bill

- copy the new CorrectionInstance implementation from CICollection to V0.04
(plus other related files: RefCountedObject.cpp, etc.)
[aria@avenue V0.04]$ mv CorrectionInstance.hpp CorrectionInstance.hpp.bak
[aria@avenue V0.04]$ mv CorrectionInstance.cpp CorrectionInstance.cpp.bak

[aria@avenue V0.04]$ cp CICollection/CorrectionInstance.hpp .
[aria@avenue V0.04]$ cp CICollection/CorrectionInstance.cpp .
[aria@avenue V0.04]$ cp CICollection/*.cpp .

cp: overwrite `./CorrectionInstance.cpp'? n
cp: overwrite `./ParseTree.cpp'? y
[aria@avenue V0.04]$ cp CICollection/*.hpp .
cp: overwrite `./CorrectionInstance.hpp'? n
cp: overwrite `./Lexicon.hpp'? y
cp: overwrite `./ParseTree.hpp'? y
cp: overwrite `./Tokenizer.hpp'? y


- can now get ParseTree before calling LoadFromFile (GetSLandTLfromTCToolLogFile)

Since Refiner is not compiling correctly since I added the code to get
the different action types (from TestCICollection), I commented it out.
Compiler errors are:

Refiner::InstantiateLogInfo(CorrectionInstance*)':
Refiner.cpp:132: duplicate case value
Refiner.cpp:124: previously used here

since I was getting compiler errors for the Refiner (getActions() bit of it, from
TestCI), I commented it out, but now I am getting linker errors :(


[aria@avenue V0.04]$ make RR
/usr/local/bin/g++ -g  -ftemplate-depth-24 -Wno-non-virtual-dtor -DNDEBUG -o RuleRefinement.o -c RuleRefinement.cpp -I/temuco/usr5/shared/code/antlr/antlr-2.7.1/lib/cpp -I/afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/stable-linux2 -I/shared/Genkit/UKernel -I/shared/Genkit/Toolbox
In file included from CorrectionInstance.hpp:10,
                 from RuleRefinement.cpp:16:
RefCountedObject.hpp:17:7: warning: no newline at end of file
/usr/local/bin/g++ -g  -ftemplate-depth-24 -Wno-non-virtual-dtor -DNDEBUG -o RuleRefinement.exe RuleRefinement.o Param.o ParamDef.o CorrectionInstance.o Refiner.o Lexicon.o Grammar.o ParseTree.o Constraint.o Utils.o TrRule.o StringUtils.o GlobalParams.o LineUp.o  /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/stable-linux2/TransferGrammarLexer.o /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/stable-linux2/TransferGrammarParser.o /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/stable-linux2/FStructLexer.o /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/stable-linux2/FStructParser.o /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/stable-linux2/transfer.o /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/stable-linux2/transfer-support.o /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/stable-linux2/chinese.o /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/stable-linux2/english.o /afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/stable-linux2/UnicodeTools.o /shared/Genkit/Toolbox/*.o /shared/Genkit/UKernel/*.o  -L/temuco/shared/code/antlr-2.7.5/lib/cpp/src -lantlr
/usr/bin/ld: Dwarf Error: Invalid or unhandled FORM value: 14.
/usr/bin/ld: Dwarf Error: Invalid or unhandled FORM value: 14.
...
/usr/bin/ld: Dwarf Error: Could not find abbrev number 343.
RuleRefinement.o: In function `main':
RuleRefinement.o(.text+0x4a9): undefined reference to `Refiner::Refiner[in-charge]()'
...
/usr/bin/ld: Dwarf Error: Invalid or unhandled FORM value: 14.
/usr/bin/ld: Dwarf Error: Invalid or unhandled FORM value: 14.
/usr/bin/ld: Dwarf Error: Could not find abbrev number 651.
CorrectionInstance.o: In function `CorrectionInstance::CorrectionInstance[not-in-charge]()':
CorrectionInstance.o(.text+0x1c78): undefined reference to `RefCountedObject::RefCountedObject[not-in-charge]()'
CorrectionInstance.o: In function `CorrectionInstance::CorrectionInstance[in-charge]()':
CorrectionInstance.o(.text+0x1f06): undefined reference to `RefCountedObject::RefCountedObject[not-in-charge]()'
collect2: ld returned 1 exit status

-> emailed Bill (again...)


- I added RefCountedObject to the Makefile and it improved, but I am still
getting an error related to the Refiner...

usr/bin/ld: Dwarf Error: Invalid or unhandled FORM value: 14.
/usr/bin/ld: Dwarf Error: Could not find abbrev number 343.
RuleRefinement.o: In function `main':
RuleRefinement.o(.text+0x4a9): undefined reference to `Refiner::Refiner[in-charge]()'
collect2: ld returned 1 exit status
make: *** [RR] Error 1


March 21, 2006

- generated  Log Files with spurious loops (2006-3-21-14-05-04-3917) and 
sent description to Bill (email-14-LogFilesWithSuriousLoops)

- added XferEngine class, which is the interface between Erikc's TransferEngine
class and my code (moved CTLSinXferLattice code from Refiner to XferEngine).

it compiles but I get a linker error, even though the Makefile looks fine...
...
/usr/bin/ld: Dwarf Error: Could not find abbrev number 343.
RuleRefinement.o: In function `main':
RuleRefinement.o(.text+0x4a9): undefined reference to `XferEngine::XferEngine[in-charge]()'
collect2: ld returned 1 exit status

- I had a constructor in hpp but it wasn't implemented in cpp. So I added it, 
but then i got:

XferEngine.cpp: In constructor `XferEngine::XferEngine()':
XferEngine.cpp:16: uninitialized reference member `XferEngine::m_SLSentence'
make: *** [XferEngine.o] Error 1

So eventually removed the m_SLSentence from private: in hpp, and it linked!


./RuleRefinement.exe

aria@avenue V0.04]$ ./RuleRefinement.exe
Compiled on Mar 22 2006 12:42:29 with g++ version: 3.2.3 in debug mode

Parameters are:
Debug Level     = 1
Lexicon File    = /usr0/aria/eng2spa/lexicons/simulation-lexicon.trf
Grammar File    = /usr0/aria/eng2spa/grammars/simulation-grammar.trf
TCTool Log File = /usr0/aria/RuleRefinement/IOFiles/SampleTCToolLogFile-processed
The SL and TL for the LogFile given as a param are:
VEO -- EL
initfile is /usr0/aria/eng2spa/auto-init.txt
Turning on Latin-1 mode
Setting normalizecase to UPPER-CASE
Setting find all translations to ON
Setting output source text to ON
Setting showtrace to full trace with src indices
Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar.trf with 15 rules added
Loading lexicon file /usr0/aria/eng2spa/lexicons/simulation-lexicon.trf with 33 lexical entries added
No parse found.
LEX 0 GRA 0 UNK 0 MORPH 0 COMP 0
No full parse was found!
Partial parse is:
VEO
tree: <(V,0:1 'VEO')>
Deleting all loaded rules.
Printing Tree extracted from Log File

( )Instantiating correction instance from TCTool Log File...
**************************************************
1. Checking against the existing lattice...
**************************************************
Segmentation fault

-> this is what happens when I don't give it a LogFile to process..
the default LogFile is 
/usr0/aria/RuleRefinement/IOFiles/SampleTCToolLogFile-processed

changed to "9", which is a real LogFile


March 22, 2006

- To give it the LogFile param explicitely use the following flag: -a 

./RuleRefinement.exe -a 9
Parameters are:
Debug Level     = 1
Lexicon File    = /usr0/aria/eng2spa/lexicons/simulation-lexicon.trf
Grammar File    = /usr0/aria/eng2spa/grammars/simulation-grammar.trf
TCTool Log File = 9

but still getting the seg fault

- Got rid of code that depended on example numbers.

- pCI->GetSLandTLFromTCToolLogFile(pLogFile, SL, TL) is working correctly,

but pXfer->ExtractParseTreeFromXferLattice(SL, TL, pGramFileName, pLexFileName); 
isn't, need to debug!


March 24, 2006

- TL was in small caps, while Xfer engine Tr alternatives were in AllCaps, 
added String2AllCaps method from StringUtils in XferEngine method.

- LTI OH: potential research collaboratiors: Vasco and Paul (pto)


March, 27, 2006

- reimplementation og XferEngine (wrapper to Erik's class):
splitting methods for XferEngine into basic, logical ones

  StartXfer(); // loads the default init file, grammar and lexicon
  StartXfer(char* pInitFileName, char* pGramFileName, char* pLexFileName);
  StartXfer(char* pGramFileName, char* pLexFileName);

  EndXfer();

  LoadGra(char* pGramFileName);
  LoadLex(char* pLexFileName);


  Translate(string SL);

And only one data member: TransferEngine* xfer;

Merged CTLSinXferLattice(char *pRefinedGramFileName, char *pRefinedLexFileName)
and CTLSinRefinedLattice(char *pRefinedGramFileName, char *pRefinedLexFileName)
into just one method: TLInLattice(string SL, string TL);

- debugged it and test it, seems to be working ok :-)

- backed it up


March 28, 2006

- looking into different Rule implementations to prepare for my meeting with Bill

- Make sure all the data members in Rule.hpp are also in TrRule:
TimeStamp
RuleHistory
TranslationPairs (to do regression testing)

In the Grammar and Lexicon classes, need to add POS_Counters!

NPcount; // stores num of rules in G for NP rules
VPcount;  // stores num of rules in G for VP rules
PPcount;  // stores num of rules in G for PP rules
Scount;  // stores num of rules in G for S rules
...

Vcount;  // stores num of rules in L for V entries
Ncount;  // stores num of rules in L for N entries
...

and the method that allows to obtain next available ID for a specific POS:

int GetNextAvailableRuleID(POS) {
    POScount++; // effectively increase the counter 
    return POScount;
}

- met with Bill: still working on CICollection (detecting spurious loops) and
started looking at how to implement the Rule class. We'll take K's class and
expand it and reimplement some of the methods, especially the ones dealing with
constraint manipulation. 
Methods returning vectors are bad!!! he's thinking about how to modify K's code

- Asked Bill about whether instead of the method he proposed to have the CI 
class accessible from the Refiner (const CI* Refiner::UseCI()), I could just
have a pointer to CI as a private data member, and then have a method
with copies a specific pCI to the data member pointer (= XferEngine) -> yep

- modifying Refiner class... keep debugging and testing it


March 29, 2006

- CI doesn't seem to have AddRef() Release() methods, as Bill implied, so I
found some in CICollection, but there was a bug in the implementation.
Bill: These methods are for a vector of CIs, which AddRefs and Release all 
CIs in the vector (not release all references in a single CI).  
These type is used by some methods in CI collection.
Check out RefCountedObject.cpp for the implementation of AddRef and Release.   

-> CI inherits these methods from RefCountedObject, working now

- /usr/bin/ld: Dwarf Error: Invalid or unhandled FORM value: 14.
/usr/bin/ld: Dwarf Error: Invalid or unhandled FORM value: 14.
/usr/bin/ld: Dwarf Error: Could not find abbrev number 343.
RuleRefinement.o: In function `main':
RuleRefinement.o(.text+0xbb9): undefined reference to `Refiner::Refiner[in-charge]()'
collect2: ld returned 1 exit status

-> missing Refiner() constructor implementation in Refiner.cpp

- Working fine now, Refiner is testing the integration of Bill's CI code :)


April 11, 2006

- testing Bill's code, made changes to Test code, to compile and get a new 
executable:

2 create object file
 /usr/local/bin/g++ -o TestCICollection.o -c TestCICollection.cpp


/usr/local/bin/g++ -o TestCICollection TestCICollection.o CICollection.o CorrectionInstance.o ParseTree.o RefCountedObject.o DirectoryTraverser.o

./TestCICollection > TestCICollection.out


April 13, 2006

- adding code to TestCICollection, to make sure I can access all the information
i need for the RRefiner (Wi, Wi', Wc and tempCTLS).
Wc are only present for edit actions, and so should be moved there (email Bill)

moved ApplyToWords to after the different cases of action, so that I can get
the right info...


./TestCICollection > TestCICollection.out1


April 23, 2006

- finished second pass on RR Algorithm and sent to Bill


April 24, 2006

- Bill stopped by: he's not going to implement all the test code for now, 
but rather he'll write debug code tomorrow when we meet, since there is not
enough time right now

Comments: he'll only worry about time stamp and History information, and 
hopefully TranslationPair info, and not about the general comments.

Went over rule history info that needs to be stored again to make sure that
he's allowing for an original rule to be active or inactive (encode as a 
comment or in a different file)

- Implemented a spurious corrections detector
/usr0/aria/RuleRefinement/bin/PreProcessLogFiles.pl 
perl script to detect false corrections ((fix sentence)+ + (next sentence)* 
in the same file), it should be called from the CI class.
1, 6

problematic log files: 
if multiple submit values, only store the last one's

2006-2-13-17-13-55-8336/1
counter = 1
tl2-al = ((1,1),(2,2))
submit = FIX TRANSLATION
senum = 9
time = Mon Feb 13 17:14:21 2006
sl = she read
ID = 2006-2-13-17-13-55-8336
tl1 = ella leyÓ
tl1-al = ((1,1),(2,2))
con = 


counter = 1
tl2-al = ((1,1),(2,2))
submit = NEXT SENTENCE
senum = 9
time = Mon Feb 13 17:14:21 2006
sl = she read
ID = 2006-2-13-17-13-55-8336
tl1 = ella leyÓ
tl1-al = ((1,1),(2,2))
con = 


2006-2-13-16-51-29-8333/6
counter = 6
tl2-al = ((1,1),(2,2),(3,3))
submit = NEXT SENTENCE
tl3-al = ((1,1),(2,2),(3,3))
senum = 9
time = Mon Feb 13 16:56:04 2006
sl = they see water
ID = 2006-2-13-16-51-29-8333
tl1-al = ((1,1),(2,2),(3,3))
tl3 = ellos ven agua
con = 


counter = 6
tl2-al = ((1,1),(2,2),(3,3))
submit = NEXT SENTENCE
tl3-al = ((1,1),(2,2),(3,3))
senum = 9
time = Mon Feb 13 16:56:04 2006
sl = they see water
ID = 2006-2-13-16-51-29-8333
tl1-al = ((1,1),(2,2),(3,3))
tl3 = ellos ven agua
con = 


counter = 6
tl2-al = ((1,1),(2,2),(3,3))
submit = NEXT SENTENCE
tl3-al = ((1,1),(2,2),(3,3))
senum = 9
time = Mon Feb 13 16:56:04 2006
sl = they see water
ID = 2006-2-13-16-51-29-8333
tl1-al = ((1,1),(2,2),(3,3))
tl3 = ellos ven agua
con = 


ok:
2006-2-13-16-51-29-8333/1
2006-2-13-17-13-55-8336/6 


- looking at how to call a perl script from C++ code:
Stephan's /usr0/aria/RuleRefinement/bin/TextFilter

- added a system call to the perl script from my C++ code:

1. copied /usr0/aria/RuleRefinement/bin/TextFilter/pstream.h to my working dir (V0.04)

2. Included it at the top of RuleRefinement.cpp:
   #include "pstream.h" // for doing system calls (perl script, etc.)
   using redi::pstream;

3. Added this line of code in main:
   redi::ipstream System( "/usr0/aria/RuleRefinement/bin/PreProcessLogFiles.pl	  /usr0/aria/RuleRefinement/IOFiles/2006-2-13-16-51-29-8333/6" );

compiled and run it and it works!!!

Unfortunately I don't know how to pass the FileName as a parameter to the 
system call other than literaly specifying the path and file name, 
so the following doesn't work:
   /*
  string FileName = "/usr0/aria/RuleRefinement/IOFiles/2006-2-13-17-13-55-8336/1";
  
  redi::ipstream System1( "/usr0/aria/RuleRefinement/bin/PreProcessLogFiles.pl FileName" );
  */


Tuesday, April 25, 2006

- Test RR module with 
1: I saw the red car
/usr0/aria/RuleRefinement/IOFiles/2006-3-31-17-25-46-23762/0
4: Mary and John fell
/usr0/aria/RuleRefinement/IOFiles/2006-3-31-17-25-46-23762/4

To test it in V0.04, run:

./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-3-31-17-25-46-23762/0

./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-3-31-17-25-46-23762/4

*******************
** moving to V0.05
*******************

- + Bill
substituting old files with new files from Bill

- not using log files with clue words for now

./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-2-13-17-13-55-8336/0

./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-3-31-17-25-46-23762/4

- found a bug in ParseTree.Find(Word)

- found a bug in extracting alignments from tree

- bill is writting GetAffectedRules, so far only AddAffectedRules
it's called for each Action


Wednesday, April 26, 2006

- 10am meeting with Bill (10:30)

- the following files are parsed correctly 2006-2-13-17-13-55-8336/
0,1,2,3,4,5,6,9 (but crashes after printing out action type...)

crash: 7,8, (multiple errors!!!)

####################
edit: 0,2,3,8,9
add: 4, 5, 
delete: 7
cwo: 8,9
####################

bugs: 

fixed
1. Action methods were not defined and implemented as cons, but test code
was using it as const

fixed
2. words extracted from parse tree (affected rules) had quoates -> got rid of them

fixed
3. overload == operator implementation incorrect (affected rules)

fixed
4. Affected rules for edit retrieved the lexical rule and not the grammar rule
(need to discard immediate parents for this)

fixed
5. implementing a function to retrieve alignments for both TL and CTL words


Pending:

6. Multiple word lex entries are copied in the grammar file instead of the lexicon file

7. missing attribute features for {VP,46}


- backed up all the updated files both in temuco (/usr4) and afs

- Currently assigning IDs to all rules and lex entries, printing out file and 
loading the simulation-G/L-ID.trf file to Xfer enginge instead. 
Ideally, this will be done offline and the final G and L will be loaded in RR. 

- pointers cannot change what they point to, if the pointee is a string object, since if I modified the size of the string, the string object wouldn't know
and it would crash, since the memory required would differ.
In order to change the value of a pointer to a string, i need to make the pointer const: const char * pToString = StringItPointsAt;

   changed the StartXfer parameters to const both in XferEngine.hpp and cpp
   -> need to do the same for the other methods

- testing (and debugging) lexical entry query/extraction 


Thursday, April 27, 2006

- working on AMTA paper (official deadline May 1, can upload final version 
until May 4 8am)

- 5-? bill

bill working on improving lexicon class to support required accessor methods

lexicon is fixed
rules seemed to be working

1st example is working!!!!!!!!

need to test rules more extensively

never refine before bifurcating, it will screw up the whole collection indexing


priority to do list 4 Bill:

Here are the high-level methods that are still required for the two examples to work:

- pNewLexEntry *Collection::CreateNewLex(POS, vsSLside, vsTLside)
creates a new LexEntry from scratch, given a POS, vsTLside and vsSLside.
later we'll need to add a default set of value constraints, depending on what 
POS is given. It needs to get the next available ID from the POScounter.

- NewGrRule->AddAgrConstraint(iVar4Wi, iVar4Wi', sTriggeringFeature, EquationType)
Given a NewGrammarRule (result from a bifrucation), adds an agreement constraint between positions Var4Wi, Var4Wi':
      ((Y_Var4Wi TriggringFeature) EquationType (Y_Var4Wi' TriggeringFeature))
      (ex: ((y3 gender) = (y2 gender)))
so it needs to create two strings for the two index variables (ex: 3-> y3 and 2->y2).

- For this to work, there needs to be some sort of method, which I call 
RuleVariableInstantiation, that given two positions in TLWords, it extracts their POS (the ParseTree is already stored in the CI, correct?) and it finds their position (from 1 to n) in the rule.

GrammarRule->GetIndexVariables(iWiPos, iWCluePos, &iInRuleWiPos, &iInRuleWCluePos) 

So for rule:

{NP,8}
NP::NP : [DET ADJ N] -> [DET N ADJ]
;;                       1   2  3
(
  (X1::Y1)  (X2::Y3)  (X3::Y2)
  ((x0 det) = x1)
  ((x0 mod) = x2)
  (x0 = x3)
  (y0 = x0)
  (y1 == (y0 det))
  (y3 == (y0 mod))
  (y2 = y0)
)

if TL[veo el auto rojo] (SLWords is [I saw the red car])

GrammarRule->GetIndexVariables(4, 3, &iInRuleWiPos, &iInRuleWCluePos)

iInRuleWiPos would now be 3 
iInRuleWCluePos would now be 2 

since 4(rojo) has POS ADJ and 3(auto) has POS N, and thus the method would
return 3 and 2 (which will allow us to add an agreement constraint bewteen y3 and y2)


- void Delta(pLexEntry1, pLexEntry2, &vsTriggeringFeatures)
Given two lexical entries, it returns a vector of strings with one ore more striggering features.

The way it does this is by comparing the two lexical entries at the feature level. Namely, it first checks if their POS are the same (if not, push the string "POS" to the TriggeringFeature vector and return), if it is, 
for each value constraint with the same feature attribute name in both lexical entries, if the value is different, push the feature name into vsTriggeringFeatures.

So for example, given two pointers two the following LexEntries:
 
ADJ::ADJ |: [red] -> ["roja"]
(
  (X1::Y1)
  ((x0 form) = red)
  ((y0 agr num) = sg)
  ((y0 agr gen) = fem)
)
{ADJ,2}
ADJ::ADJ |: [red] -> ["rojo"]
(
  (X1::Y1)
  ((x0 form) = red)
  ((y0 agr num) = sg)
  ((y0 agr gen) = masc)

Delta would return vsTriggeringFeatures with one element: "agr gen".


- PostulateNewFeature(sNewFeatureName)
As I told you before, the grammar and the lexicon (RRRuleCollection) need to 
keep track of the last feature name ID (int) used by the grammar (possibly as a general comment at the beginning of the grammar and lexicon files). This will start being 1, and every time PostulateNewFeature() is called it will increase it (both in G and L) and then will return a string containing that integer. For example, for FeatCounter = 11, this method will return "feat_11".


And I think this is it, to get the two examples working end-to-end!


For the other examples that I have in mind, there are a couple more methods that will be required, and I describe them to you in case you have time:  

- Add a Value constraint (both for GrRules and LexEntries)
NewRule->AddValConstraint(iVar4Wi, sTriggeringFeature, EquationType, Value)

I anticipate the value always being either + or -, so it could be int instead of strings.

- NewGrammarRule->AddConstituentToRHS(sPOS_or_TLWord, iPositionToBeAdded)
adds the POS or Word given as a parameter into the RHS (TLside) at  iPositionToBeAdded. Note, it also needs to update alignments and index values

- NewGrammarRule->MoveConstituentRHS(sPOS1, iFinalPositionInRHS)
given the POS to be moved, and the FinalPosition in the RHS (starting at 1) where it needs to be moved to, it changes the RHS (TLside) accordingly and it
updates alignment and index values

- Implement deactivating a rule by printing it out as a comment

- search the G to see if any rule has the POS for Wi' on the RHS (anywhere)
Actually, I need to query the grammar to see if POS_Wi is somewhere in a 
Rule RHS, but also I need to take into account the left and right context of 
Wi' word, namely the POS of the words to the right and to the left of it (AffectedRules?). But I am not sure how to do this... any ideas?

- Finish spurious loop detection

- Incorporate system call to pre-process CIs into his code 
(filter out spurious corrections)

-############################################################

Monday May 1, 2006 (dia del treballador a tot el mon excepte els EUA :-()

- Adding new lexical entry to L and reload to Xfer enfine, see final translation

[aria@avenue V0.05]$ ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-3-31-17-25-46-23762/4 >! 4RR.out

- added ReLoadLex and Gra methods to XferEngine.

Testing reloadgra:

Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon-ID-REFINED.trf with 30 rules added

it seems that it adds all the rules again... check with Erik later.
I can always do as I was doing before, and create a file with the changes that
needs to be reloaded 

and only after it has confirmed to increase translation quality, I can add it
to the Lexicon and save it all together in the same file...


- Bill noon-4pm

bill: enable and disable rule in G and L is working 
      -> need to test

      implemented everything I asked him to expect for search in G 
      (for a particular POS sequence) and spurious loop...
      
      -> need to test!

ari:

added CheckRefinedLattice method to XferEngine and example 4 is working end to end!! :-))))))))))))))))

started working on other example (0): agreement constraint

./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-3-31-17-25-46-23762/0 > ! 0RR.out

- delta function is working for 0!!!


Tuesday, May 2, 2006

- AMTA paper

- Bill (5-10pm)

implemented RuleVariableInstantiation (works for base case, doesn't work
for nodes that are embedded in different rules (missing recursion -> bill is
working on it)

Create new lexical entry (done)


- testing delta function on other examples. It works for the following examples: 0, 3, 

for 2 to work, I need to be able to create a new LexEntry from scratch

When trying to run example sentences 8 and 9 I get a seg fault:
MAIN::Instantiating correction instance from TCTool Log File
Segmentation fault
	     -> need to look into this after deadline

Created a constraint and set all the different parts + added to rule
(it's working!!!)

-> problems loading G and L (see below)


Wednesday, May 3, 2006

- AMTA paper

- Jaime: keep track of refinement status: proposed, confirmed1 (by exact match),
confirmed2 (by increasing automatic MT metrics over a regression test)

automatic eval metrics: *modified* BLUE (BLUE cannot be calculated just for 
a single sentence), NIST, METEOR and Jaime also suggested TER (error rate) and
HER (GALE) -> which I might end up having to use depending on automatic results.

- backed up V0.05 on Avenue and temuco:/usr4/

- submitted AMTA paper (3am)


Friday, May 5, 2006

./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-3-31-17-25-46-23762/0 > ! 0RR.out

[aria@avenue lexicons]$ grep "|:" simulation-lexicon-ID.trf | wc -l
     33
[aria@avenue lexicons]$ grep "|:" simulation-lexicon-ID-REFINED.trf | wc -l
     34

RefL should have 34 entries but the xfer engine only loads 25...
create an init file and try it manually to see if it's chocking somewhere

from 0RR.out:
Loading lexicon file /usr0/aria/eng2spa/lexicons/simulation-lexicon-ID.trf with 33 lexical entries added

Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon-ID-REFINED.trf with 25 rules added


And with the grammar:

[aria@avenue grammars]$ grep ">" simulation-grammar-ID.trf | wc -l
     15
[aria@avenue grammars]$ grep ">" simulation-grammar-ID-REFINED.trf | wc -l
     16

from 0RR.out:
Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-ID.trf with 15 rules added

Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-ID-REFINED.trf with 1 rules added


However whenever I load them directly to the Xfer engine, it parses them w/o 
a problem:
[aria@avenue eng2spa]$ transfer -if init-simulation.txt
Turning on Latin-1 mode
Setting normalizecase to UPPER-CASE
Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-ID-REFINED.trf
with 15 rules added
Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon-ID-REFINED.trf
with 34 rules added
Setting find all translations to ON
Setting output source text to ON
Setting showtrace to full trace with src indices
Translating file: /usr0/aria/eng2spa/corpus/error-typology-simulation
1:
Translating "I see the red car".
2:
Translating "she read".
3:
Translating "I see the red unicorn".
4:
Translating "Mary plays the guitar".
5:
Translating "John and Mary fell".
6:
Translating "you saw the woman".
7:
Translating "they see water".
8:
Translating "I would like to go".
9:
Translating "I saw you".
10:
Translating "Gaudi was a great artist".

TOT 0 LEX 0 GRA 0 UNK 0 MORPH 0 COMP 0 COV nan
UNKNOWNS
LIKE    2


Only difference is how they are called, different methods in XferEngine...

pXfer->StartXfer(pGramFileNameID, pLexFileNameID);
pXfer->ReLoadLex(pRefinedLexFileName); 


I think I know what is going on!
the reload command only adds the different entries, for the grammar this is 1,
which is ok, and for the lexicon it's 25 even though it should be 1...

[aria@avenue lexicons]$ less simulation-lexicon-ID-REFINED.trf .trf
[aria@avenue lexicons]$ diff simulation-lexicon-ID.trf simulation-lexicon-ID-REFINED.trf
79a80,84
> {N,9}
> N::N |: ["unicorn"] -> ["unicornio"]
> (
>   (X1::Y1)
> )

maybe the Xfer engine reload method is sensitive to order or something else
I am not taking into account... ask Erik!


Monday, May 8, 2006

Erik's answer:
You need to use loadgra to load the lexicon also, and not the lexicon specific commands like loadlex.
**************************************
weird things going on with the refined G and L, 

- RefL should have 34 entries but the xfer engine only loads 25...
create an init file and try it manually to see if it's chocking somewhere

Loading lexicon file /usr0/aria/eng2spa/lexicons/simulation-lexicon-ID.trf with 33 lexical entries added

Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon-ID-REFINED.trf with 25 rules added


- the RefG file seems to constain the lex, even though I have already debugged it extensively... double check and keep debugging...
**************************************

- ok, now it's just reloading 1 rule, however, it doesn't seem to be resticting
the gender of the adjective... 

-> still need to debug


Tuesday, May 9, 2006

- Bill: 1:30-3:50pm

Bill says that they crash because they have a spurious correction... but that's not true...
crash: 7,8, (multiple errors!!!)
MAIN::Instantiating correction instance from TCTool Log File
Segmentation fault

I asked him to take a look at the two log files that are not correctly parsed.

bug is in the way he "mirrors" the alignments
assumption: i apply action to words, and then I extract the alignment info
I expect to see after the previous action has taken effect

copied 5-9-06/CorrectionInstance.cpp over

testing it:
./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-3-31-17-25-46-23762/0 > ! 0RR.out

7, 8 and 9 still seg fault

Bill says it's a bug in the C++ iostream library, he asks the code to be in one
position, and the code places him in a different position... hard to fix, can't do it on the spot.
He'll do it as HW and we'll meet again Thursday 1pm

- implement adding a lexical entry from scratch (I had already added the code
in edit to do that!!):

   -> Bill implemented tree.GetPOSofWord(Wi)

	    POS_TYPE POS = tree.GetPOSofWord(TLWord);
	    pNewLexEntry->RRlexiconRule(POS, RC.USeRuleIDManager(), svSLside, vsTLside);

But log file 2 deals with edit word, so need to debug edit case code:
line 700: there must be a bug in the get alignment code, since
 log file 2 is not outputing this...

since I hadn't applied any action Calignments does not contain anything, 
I needed to look into alignments! now it's working

logfile 2 is working correctly as well!!


Thursday, May 11, 2006

- worked on perl script to create inflected lexicon from MM vocabulary
/usr0/aria/bin/WordList2Lexicon.pl

shall put -> pondrÃ¡n
V |: ["put" ] -> ["pondrÃ¡n"]
((x0 form) = shall)
)
- POS for TL side [done]
- get whole SL side [done]
- accented characters?
- add features -> need to debug getFS -> emailed Erik

RuleRefiner:
- implementing add a new constituent to the RHS of the rule to test bill's code
  figured out there are a couple of methods missing, emailed list to Bill


Bill: 1:40-6pm

- fix iostream library bug, so that 7, 8 and 9 parse
it was actually a problem converting his code from windows to unix, he was reading files in binary mode and so was returning positions in a binary file instead of text

 but when I tried running RR on 7 and seg faulted again :(

 fixed this bug, now 8 and 9 are parsing file and there is a problem with 7.
 Namely, parse tree is empty, there is no alternative translation that 
 matches with TL sentence!!! Reason: I modified the lexicon!!!
 -> need to create a new CI for 7 with current lexicon (TLS=me gustaria que ir)

TL sentence is [WOULD LIKE QUE IR]
these are the alterative translations and their parses for I would like to go:
tl-0: ME GUSTARÍA QUE IR
tree-0: ((S,0 (VP,15 (V,7:2 'ME GUSTARÍA') (PP,2 (PREP,1:4 'QUE') (VP,1 (V,8:5 'IR') ) ) ) ) )
No alternative matches the TL sentence: WOULD LIKE QUE IR
tl-1: YO ME GUSTARÍA QUE IR
tree-1: ((S,1 (NP,1 (PRON,1:1 'YO') ) (VP,15 (V,7:2 'ME GUSTARÍA') (PP,2 (PREP,1:4 'QUE') (VP,1 (V,8:5 'IR') ) ) ) ) )
No alternative matches the TL sentence: WOULD LIKE QUE IR

- bill incorporated the system call to the CI.cpp. Now both correct CIs with 
and without spurious corrections are working:
2006-2-13-17-13-55-8336/1 and 6
2006-3-31-17-25-46-23762/1 and 6
2006-2-13-16-51-29-8333/6 is fine but 1 crashes further down


problem with the logic of the RR:

./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-2-13-16-51-29-8333/1 >! 8333-1RR.out

both lexical entries exist... delta function should extract agr person...
seg faults when calculating the delta function

[read]->[leÍ] is not in the lexicon

{V,9}
V::V |: ["read"] -> ["leí"]
(
  (X1::Y1)
  ((x0 form) = read)
  ((x0 actform) = read)
  ((x0 tense) = past)
  ((y0 agr pers) = 1)
  ((y0 agr num) = sg)
)
{V,10}
V::V |: ["read"] -> ["leyó"]
(
  (X1::Y1)
  ((x0 form) = read)
  ((x0 actform) = read)
  ((x0 tense) = past)
  ((y0 agr pers) = 1)
  ((y0 agr num) = sg)
) 

- corrected bug in simulation-lexicon.trf (leyo, pers = 3!)

- made sure that lexical entries are all in small caps, including
accented characters!! 
Added: TLWord = StringUtils::StringToLower(Wi.value);
to edit case, working now

However, the AffectedRule is returning VP,1 and what we need is S,1...

MAIN::Printing Tree extracted from Log File:
((S,1 
(NP,1 
(PRON,3:1 'ELLA'))
(VP,1 
(V,9:2 'LEÍ')))

Affected Rule are: 1
{VP,1} [leÍ]

problem: without a clue word having been detected by user (ella), there is no
way AffectedRules can return S,1

HACK: in this case, since there is only one other word, pick that as clue word.

- bug in the parse tree, it has an extra parent and so it returns (S,1 and an id!

-> asked Bill to add a check to  ID.LoadFromString, so that if there is a parent, it deleted it.

it's working now :)

- bill implemented VariableInstantion for one variable, for MoveConstit
testing it...
./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-2-13-16-51-29-8333/9

after some debugging, it's working now :)
it inputs 5 and returns 3 :-)

******************************************************************** 
Recap
******
####################
correct: 1, 6
edit: 0,2,3,8,9
add: 4, 5, 
delete: 7
cwo: 8,9
####################

./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-2-13-16-51-29-8333/

0 -> didn't manage to get rid of ambiguity (still producing auto roja!)
     try loading the refined grammar outside the RRefiner and see what the 
     problem is =c?

1 (purposefully picked wrong TL for testing)
  -> didn't manage to get rid of ambiguity (still producing ella lei!)
     try loading the refined grammar outside the RRefiner and see what the 
     problem is =c?

2 OOW (LEXICAL ENTRY ADDED FROM SCRATCH) [done]

3 Added sense of the word (play->toca) [done]
  -> still need to get rid of *juega guitarra

4 Modified lexical entry cayeron -> se cayeron
ambiguity necessarily increased. Still, lattice precision should be better

5 working on implementing methods that will allow me to add a new constit

6 OK

7 need to generate new CI with current lexicon, since right now it's not finding TL in lattice :(

8 ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-2-13-17-13-55-8336/8 >! 8RR.out

Complex: 2 actions: edit + cwo
  te is created as a copy of tu -> feat_0 is postulated
  (should really be case, but RR cannot know that)

->   but feat_0 doesn't get added to the lexical entries


9 implementing it...

******************************************************************** 

Friday, May 12, 2006

- debugged WordList2Lexicon.pl with Erik
  -> need to load it to memory, otherwise it will take too long.


-> send email to Bill with final complete to do list

8 ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-2-13-17-13-55-8336/8 >! 8RR.out

Complex: 2 actions: edit + cwo
  te is created as a copy of tu -> feat_0 is postulated
  (should really be case, but RR cannot know that)

->   but feat_0 doesn't get added to the lexical entries


-> need to create a new CI for 7 with current lexicon (TLS=me gustaria que ir)
also for 1: pick ella lei and then correct to leyo and clue word = ella -> test RR!


Monday, May 15, 2006

- TCTool: 
cp input-tct-4RRExamples input-tct 
added alternative translations to 7 (I would like to go) to input-tct

2006-5-15-13-08-25-11387

need to copy input-tct-US2 back into input-tct after I'm done with the RR examples


Testing 1 and 7:

./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08-25-11387/1 >! 11387-1RR.out


New agreement constraint created is (y1 agr pers) = (y2 agr pers)
Added to the rule...
{S,91}
S::S : [NP VP] -> [NP VP]
(
;(P:{S,1})
  (X1::Y1)  (X2::Y2)
  (x0 = x2)
  ((y1 case) = nom)
  ((y1 agr) = (x1 agr))
  ((y2 tense) = (x2 tense))
  ((y1 agr pers) = (y2 agr pers))
)

****************************************************************************
***The refined grammar and lexicon produced the user corrected translation***
		The correct translation is: ELLA LEYÓ
****************************************************************************

****************************************************************************
***However it is still producing the incorrect translation, previously corrected***
		by the user: ELLA LEÍ
****************************************************************************

it seg faults!!! -> debug

-> need to figure out why the constraint does not prevent "ella lei" from generating


./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08-25-11387/7 >! 11387-7RR.out

tree-0: ((S,0 (VP,15 (V,7:2 'ME GUSTARÍA') (PP,2 (PREP,1:4 'QUE') (VP,1 (V,8:5 'IR') ) ) ) ) )

Before action: 
SLWords: I would like to go 
TempCTLWords (1st time = TLWords): ME GUSTARÍA QUE IR 
Alignments: ((1,1),(2,1),(4,2))

/*
    case DELETE:
      {
	  cout << "Action type = delete\n";
	  ActionDeleteWord *pDeleteWord = (ActionDeleteWord*)pAct;
	  cout << "Wi: [" << pDeleteWord->GetDeletedWord()  << "]\n"   
		 << "i is " << pDeleteWord->GetPosDelete() << endl;
*/

Action type = delete
Wi: [IR]
i is 3
Affected Rule are: 0

When storing TLWords, one item per position, the fact that two or more words
are part of the same lexical entry is not reflected...
and so when tr

the accessors count positions indicated by alignments,
but the CI class stores positions per word item, instead of entries...
1."me gustaria" 2.que 3.ir
1. me 2. gustaria 3.que 4.ir


Did all the alignment changes first and then:

./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-42-35-31799/0 >! 31799-7RR.out


Before action: 
SLWords: I would like to go 
TempCTLWords (1st time = TLWords): ME GUSTARÍA QUE IR 
Alignments: ((1,1),(2,2),(3,2),(4,2),(5,3),(5,4))

Action type = delete
Wi: [IR]
i is 3

Not touching alignments:
./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-47-21-13849/0 >! 13849-7RR.out

still same :(

-> asked bill to check if it could be that the TCTOOLPOS_TO_VECTOR, and indeed
this is what it was, he's fixing it :)

Before action: 
SLWords: I would like to go 
TempCTLWords (1st time = TLWords): ME GUSTARÍA QUE IR 
Alignments: ((2,1),(5,4))

Action type = delete
Wi: [QUE]
i is 3

Ok, now I just need to implement the rest of the delete case algortihm

------------------

Artificially adding (( =c +)  constraint to edit case (example 1)
, just to make sure it's working

[aria@avenue V0.05]$ ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08-25-11387/1 > ! 1-temp-RR.out

New value constraint created is (y1 agr pers) =c +
Added to the rule...
{S,91}
S::S : [NP VP] -> [NP VP]
(
;(P:{S,1})
  (X1::Y1)  (X2::Y2)
  (x0 = x2)
  ((y1 case) = nom)
  ((y1 agr) = (x1 agr))
  ((y2 tense) = (x2 tense))
  ((y1 agr pers) = (y2 agr pers))
  ((y1 agr pers) =c +)
)

commenting out that part of the code for now

Bill's implemented the following methods, tesing:


- GetOriginalRule(pRefinedRule*)
Given a previously derived rule, it returns a pointer to the original rule
that got bifurcated into it. If its not a derived rule, it should return
NULL.

- vpRules GetDerivedRules(pOriginalRule*)

- Rule comparison method:
bool SameRuleExcepptFeatName(pDerivedRule1, pDerivedRule2,
&sFeatNameDRule1, &sFeatNameDRule2);

* Tested case when there is only one derived rule, and it's working

* Tested case when there are multiple derived rules, but they are not identical
to the newly added rule, it's working

* Tested case when there are multiple derived rules, and the newly added has
an identical rule in the history, after some debugging, it's working

Checking if they are really the same or not
Identical Rule found in the grammar, with different feature name: Feat1 is feat_0 and Feat2 is feat_1
{S,93}

------------------------------------------------------------

Bill will be working on Spurious Loop and Error complexity and finish at least
one of the two tasks a couple of weeks from now.

--------------------------------------------------------------

still need to test:
./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08-25-11387/0 >! 0RR.out

- bool LexicalEntry::ReplaceFeatName(sFeatName1, sFeatName2);

- RuleCollection::DecreaseFeatNameCounter.

- VariableInstantiation for one position

when adding a word to a GraRule, the POS of the following word should be skipped (fLookatLeafPOS=false), since the method needs to retrieve the parent node
of the next word (Leaf).


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

- Error complexity. Not sure what is currently implemented, but for now,
having a simplified version would probably be fine, since I am planning to
deal with sentences containing 2 or 3 examples. Sentence containing
independent errors first, sentences containing dependent errors last in
the ranking. So I guess error dependency should have a weight of 3 for
now, and each error a weight of 1.

-> add 1 per correction to the score if it's independent to previous errors, 
and 3 per correction if it's dependent


I'll look into it, and then figure out what exaclty needs to be implemented,
and will email Bill
- Reverse Refinement(s)
I havent had time to look into this, but it would be great if we could
look at the time stamp management before you leave, so that if rule does
not result into an improvement on a test set (T2), there is a good way to
reverse to the previous version of the grammar (T1). As I told you before
(and maybe its already implemented), it would be useful to have a
variable that expresses whether a rule lead to improvement or not (bool
Rule.ImprovedAccuracy()).
Maybe this is too complex to do before you leave, but maybe you have
already implemented most of what would be needed, and it would be fairly
simple. In any case, Id like to know.

- Translation Pairs annotation to Rules
Even though I still need to create a TrPair file with the right annotation
for this, it would be great if the set up to check if any rule has TrPairs
associated with it is already in place. It could probably be something as
simple as a vector of ints, each int indicating the line in a text file
that contains the appropriate TrPair. So that if the vector is empty,
there is no support for that rule, and if its not, there is and thus
should not be deactivated after being refined.

Finally, something I asked you to do at the very beginning, which you
might very well have already implemented, but I just wan to double check,
is to have an easy way to group CIs according to user that generated it
(namely directory name). As far as I know, CIs are currently indexed by
SL-TL and SL-TL-CTL (in order to pick BestCI), but in addition to that, it
would be good to have a way to back trace any refinement (rule or lex
entry) to what user(s) made the correction that lead to it. Since my
understanding is that CICollection is merging similar CIs and stores the
different user IDs into a vector (is this right?), then I believe all that
would be needed is a way to store that info as a comment to the rule
derived from that CI. What do you think?
****************************************************************************

****************************************************************************
Bill left for the Summer
*****************************************************************************

Tuesday, May 16 2006

- debugging WordList2Lexicon.pl, it's working it just takes a loooooooong time.
Added progress tracker and formatted features so that they are in the lexicon
format + outputting multiple entries per inflected word (ex: paso)

./WordList2Lexicon.pl <  ../eng2spa/lexicons/MM-WordList/MM-TranslationsSortNoDup.txt >! ../eng2spa/lexicons/LexiconMM.trf

currently, lexical entries printed out by POS in MM lexicon if I want to print 
them out organized by their real pos from maco-girat, need to store them again
into a hash and after having read all the MM lexicon, print it

-> actually, since I don't have POS list for MM, I should just organize it by 
pos from maco-girat!

- moved to V0.06 and cleaned it up (deleted old files)

- Created ExampleSentenceOutput to keep track of what is already working

- backed it up in Avenue (afs) and temuco:/usr4/


Wednesday, May 17, 2006


- killed WordList2Lexicon.pl, since it was still running and taking all the space!!!

[aria@avenue lexicons]$ wc -l LexiconMM.trf
596737820 LexiconMM.trf

Finished reading Maco-girat file
Inflected forms: 920994
mapping PAROLE Tags now...
;;************V************
;;************N************
;;************ADCONJ************
;;************PRON************
;;************INTERJ************
;;************J************
value from hash: []
getFS::CitationFeatures is []
;;************V************
;;************N************
value from hash: []
getFS::CitationFeatures is []
;;************ADCONJ************
;;************PRON************
;;************INTERJ************
;;************J************
value from hash: []
getFS::CitationFeatures is []
;;************V************
;;************N************
value from hash: []
getFS::CitationFeatures is []
value from hash: []
getFS::CitationFeatures is []
;;************ADCONJ************
;;************PRON************
;;************INTERJ************
;;************J************
value from hash: []
getFS::CitationFeatures is []
;;************V************
;;************N************
....

deleted it


Thursday, May 18, 2006

- moved WordList2Lexicon.pl to temuco:/usr4/aria/bin
MM-TranslationsSortNoDup.txt  WordList2Lexicon.pl

debugging...

WordList2Lexicon.pl < MM-TranslationsSortNoDup.txt


- continue working on RuleRefiner:

./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08-25-11387/0 >! 0RR.out

* RuleCollection::DecreaseFeatNameCounter works :)


- created high level methods for setting constraints (moved code from main to
the RRConstraint class):

 pConstr->SetAgrConstraint(EType,TriggerFeat,TriggerFeat,CluePOSPos, POSPos);

 pValConstr->SetValueConstraint(EType, TriggerFeat, CluePOSPos, sValue);

debugged and tested, it's working :)

- Adding discriminating feature to both original and refined lexical entries 
(l. 823) [example sentence 3]

Postulating New Feature: feat_0
New value constraint created is (y0 feat_0) = +
Added to refined lexical entry...
{V,11}
V::V |: ["plays"] -> ["toca"]
(
;(P:{V,5})
  (X1::Y1)
  ((x0 form) = play)
  ((x0 actform) = play)
  ((x0 tense) = pres)
  ((y0 agr pers) = 3)
  ((y0 agr num) = sg)
  ((y0 feat_0) = +)
)
Blocking constraint created is (y0 feat_0) = -
Added to original lexical entry...
{V,11}
V::V |: ["plays"] -> ["toca"]
(
;(P:{V,5})
  (X1::Y1)
  ((x0 form) = play)
  ((x0 actform) = play)
  ((x0 tense) = pres)
  ((y0 agr pers) = 3)
  ((y0 agr num) = sg)
  ((y0 feat_0) = +)

Calculating Delta Function for juega and toca
...
The triggering feature is [feat_0]


-> finish to implement example 3
add feat0 = + to clue word!!!
and then check xfer output, juega guitarra should not be produced now
#

debugging adding value constraint to clue word (ex: 3)


Friday, May 19, 2006

- ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08-25-11387/3 >! 3RR.out

added Val constr to clue word, and agreement constraint to the Affected Rule:

WiPOSPos is: [1] and CluePOSPos is: [2]
Bifurcated {VP,2}
Need to create agreement constraint with feat_0 and add constraint to rule
New value constraint created is (y2 feat_0) = (y1 feat_0)
Added to the rule...
{VP,47}
VP::VP : [VP NP] -> [VP NP]
(
;(P:{VP,2})
  (X1::Y1)  (X2::Y2)
  ((x2 case) = acc)
  ((x0 obj) = x2)
  ((x0 agr) = (x1 agr))
  (x0 = x1)
  ((y0 tense) = (x0 tense))
  ((y0 agr) = (y1 agr))
  ((y2 feat_0) = (y1 feat_0))
)

However, I need to percolate the feature up from the lexicon, all the way to the VP,2 rule, namely, it also needs to be added to NP,3 for position 2 (N)

((S,1 
(NP,2 
(N,3:1 'MARÍA'))
(VP,2 
(VP,1 
(V,5:2 'JUEGA'))
(NP,3 
(DET,2:3 'LA')
(N,5:4 'GUITARRA'))))


May 23, 2006

- looked at why refined grammar wasn't successfully reducing ambiguity and
realized that reloadgra is not doing the right thing, since when loading
the grammar with the load command, it does the right thing.
-> email Erik


May 24, 2006

it turns out, reloadgra is doing the right thing, however, the original
grammar, namely the original rule which the refined grammar has deactivated
by commenting it out, is not effectively deactivated when using reloadgra,
so I need to actually clear all the rules and then load the grammar from 
scratch. 
-> emailed erik to find out if there is a method to clear rules in the G or the
L, as opposed to all the rules.


May 26, 2006

- ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08-25-11387/3 >! 3RR.out

- I can now extract clueword entry from lexicon

- working on add constit case: fixed the logic of lexical refinements vs
grammar refinements


May 30, 2006

trying with a simpler log file:

./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-3-31-17-35-31-23763/5 >! 5RR.out
 same problem.

However, when I run it on  ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08-25-11387/4, it doesn't freeze there, it proceeds
until the end of the program..

so it works for 4 but does NOT work for 5, and I have no idea why!!!


-> debugging 4, since there seems to be somehting weird with the UseRuleIDManager() when creating the new lexical entry and then when loading the refined
lexicon to the Xfer engine, it appears empty! (even though all this was working
before...)

[aria@avenue V0.06]$ ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08-25-11387/4 > ! 4RR.out
line 1:1: unexpected end of file
Segmentation fault

since I had add to the logic of the ADD case, it needed some adjusting and
bug fixing, now it's working

- now 5 is also working, or at least it gets to the grammar refinements :-)

- two bugs in bill's code: literal (a) does not have ""
and constraint indices are not updated when calling 
 pNewGraRule->AddConstituentToRHS(pConstit, iPosAdded)
// probably the best place to add "" is in the AddConstituentToRHS, but there
you might not know if it's a literal... a -> "a"


Wednesday, May 31, 2006

- finished add constit logic, need to debug when constit and add constit bugs
are fixed

- working on cwo case with sentence 9 (8 is too complex, leave for later)
LoadCI is still chocking...

- sent email to Bill with bugs and fixes (Rule Refiner stuff also stored in
 bill/info/email-24-FollowUp1)

- backed up V0.06 to Avenue afs and temuco /usr4/

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Working on writing and presenting papers, Barcelona + DiagnosticTestSet %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Thursday, July 20, 2006

- Right now even though I could process with multiple corrections with the 
CICollection (once it's been debugged anyway), every time I see a correction 
I take the original grammar and refine it, save it. 
-> Need to make sure that the next iteration to the 
CICollection builds on the already refined grammar, so that I can 
incrementatlly refine the grammar, even though the code is still not in place
to revert grammar changes and so on.

Original code is in CICollection/TestCICollection
need to integrate


Tuesday, July 25, 2006

- looked at how to integrate CICollection code to my main

- dirs.txt is the file that contains all the directories that need to be 
loaded to a CI Collection. Edited it to contain 
/usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08-25-11387

and then run TestCICollection on it:

./TestCICollection

works up to when it comes to outpting error complexity

- integrating code into main:
first read in all the log files and store into the CICollection
then, traverse the collection and process each CI (for now in order they have
 been stored) -> later test BestCI code


make RR
./RuleRefinement

- debugging...

	./RuleRefinement > RR.out


Thursday, July 27, 2006

- realized I was running an old version, since the current Makefile was generating 
RuleRefinement.exe instead of RuleRefinement


Friday, July 28, 2006

- looking into AddCI, there is both CICollection and CICollectionIndex,
since bill keeps two two in separate DS. Need to be careful with this.

- currenlty program chockes when reading in 
/usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08-25-11387/8

TL -- SL seem to be empty, even though it can extract the right tree somehow:

File : /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08-25-11387/8
MAIN::The SL and TL for the LogFile given as a param are:
XferEngine::ExtractParseTreeFromLattice: 
TL sentence is [VISTE TÚ
these are the alterative translations and their parses for I saw you:
tree-0: ((S,0 (VP,46 (V,4:2 'VISTE') (NP,1 (PRON,2:3 'TÚ) ) ) ) )

MAIN::Printing Tree extracted from Xfer engine for SL-TL pair in the logfile:
((S,0 
(VP,46 
(V,4:2 'VISTE')
(NP,1 

Segmentation fault

!!!! Noticed that log files' format is not consistent !!!
/usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08-25-11387/
The TL sentence in 1 and 8 ends up with a stressed character but no closing "!!
Not sure why this is, and the RR did not chocke when processing 1, but still!

for now editing 8 and adding closing " to see if that makes a difference

ah it seems it's a problem with the new version of UNIX less!! 
when I opened it with emacs, it looks fine, weird..

!!!!!!! NEED TO LOOK AT THE OUTPUT FILE IN EMACS AND NOT LESS !!!!
it has nothing to do with the formating of the log files, RR chockes when 
printing the tree for 

tree-0: ((S,0 (VP,46 (V,4:2 'VISTE') (NP,1 (PRON,2:3 'TÚ') ) ) ) )
TL is one of the alternatives: VISTE TÚ
MAIN::Printing Tree extracted from Xfer engine for SL-TL pair in the logfile:
((S,0 
(VP,46 
(V,4:2 'VISTE')
(NP,1 

For now:
mv 8 99

then first file it processes is 99!, so

mv 99 ../8

Now it's working!
(need to figure out why the TreePrinting function chockes on 8)

0: Adding (agr gen) constraint happens between position 2 and 1, instead
of 3 and 2... looked at when the RR extracts positions and saw that it's due
to the fact that the tree stored in order to extract the POS positions is not
the right one, but rather the last one it got instantiated for the gaudi example.

The tree needs to be stored in each CI at run time, since taking the original
trees output by the system would not reflect recent grammar/lexical refinements,
so we don't really save any processing time by doing that.

-> Modified  LoadTCToolLogFile to instanciate the tree data member and added 
a GetTree funtion to CI that returns the tree. (I tried returning a pointer
to it, instead, but i couldn't debug it, so left it like this.

Before doing anything, for each new CI that is extracted from the collection
to do the refienements, I first GetTree from it and instantiate it with tree,
since lots of code relies on it afterwards...

Working fine now.

0: refinement succcessful
1: In any case, for the refinements in 1 to be effective, it needs to percolate to
the NP and VP, and this method is still not implemented...

Everything ok until the point when it loads the refined grammar...
Deleting all loaded rules.
Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon-ID-REFINED.trf with 34 rules added
Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-ID-REFINED.trf with 

;F0 is the feature counter that tells the grammar and the lexicon what feature
name to use next 

[aria@avenue V0.07]$ ./RuleRefinement >! RR.out
Unable to open /usr0/aria/eng2spa/lexicons/simulation-lexicon-REFINED-1.trf.
No parse found.
LEX 0 GRA 0 UNK 0 MORPH 0 COMP 0
No parse found.
LEX 0 GRA 0 UNK 0 MORPH 0 COMP 0
Unable to open /usr0/aria/eng2spa/lexicons/simulation-lexicon-REFINED-2.trf.
No parse found.
LEX 0 GRA 0 UNK 0 MORPH 0 COMP 0
No parse found.
LEX 0 GRA 0 UNK 0 MORPH 0 COMP 0
Unable to open /usr0/aria/eng2spa/lexicons/simulation-lexicon-REFINED-4.trf.
Segmentation fault


for some reason lexicon-R-1 is empty, need to debug.
EDIT CASE

lexicon is not printed to screen but it should!
using realod command

there was a logic problem, only on one of the else clauses the lexicon got
printed to a file, and since all the rules are cleared before re-runing
the xfer engine, to get rid of commented out rules, it needs to be
reprinted even when no refinement is done


- 2: OOWV -> xfer did not find a full parse!!! oops

TL sentence is [VEO EL UNICORN ROJO]
CTL sentence is [VEO EL UNICORNIO ROJO]

No full parse was found! 
Partial parse is:
VEO EL ROJA UNICORN 
tree: <((S,0 (VP,1 (V,1:2 'VEO') ) ) )> <(DET,1:3 'EL')> <(ADJ,1:4 'ROJA')> <(UNK,0:5 'UNICORN')> 

3: the grammar does not load
(this is output by the Xfer engine itself when the clearall command is called) 
Deleting all loaded rules.  
Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon-REFINED-4.trf with 35 rules added
Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED-4.trf with 

-> look into it!
/usr0/aria/eng2spa/grammars/simulation-grammar-REFINED-4.trf
has all the refinements so far integrated in it, yeahh!!!


Monday, July 31, 2006

- ! first parse tree retrieved by RR might not be the best one...
(example in AMTA 2006 paper included VP,46, which is an automatically learned
rule that had a incorrect generalizations... 

- debugging RR, why it seg faults when trying to load the refined grammar?
I looked at both simulation-grammar-REFINED-4.trf and 
simulation-grammar-REFINED-2.trf and I didn't see anything that would make the
last one load ok and the first one not load ok...

- running the Xfer engine directly with simulation-grammar-REFINED-4.trf was fine
cd /usr0/aria/eng2spa
transfer -if init-simulation.txt
Turning on Latin-1 mode
Setting normalizecase to UPPER-CASE
Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED-4.trf with 15 rules added
Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon-REFINED-4.trf with 35 rules added
Setting find all translations to ON
Setting output source text to ON
Setting showtrace to full trace with src indices
Translating file: /usr0/aria/eng2spa/corpus/error-typology-simulation
1:
Translating "I see the red car".
2:
Translating "she read".
3:
Translating "I see the red unicorn".
4:
Translating "Mary plays the guitar".
5:
Translating "John and Mary fell".
6:
Translating "you saw the woman".
7:
Translating "they see water".
8:
Translating "I would like to go".
9:
Translating "I saw you".
10:
Translating "Gaudi was a great artist".

TOT 0 LEX 0 GRA 0 UNK 0 MORPH 0 COMP 0 COV nan
UNKNOWNS
LIKE    2
[aria@avenue eng2spa]$ emacs corpus/error-typology-simulation.out.debug

And it produced the right result:
3:
sl: Mary plays the guitar
tl: MARÍA JUEGA LA GUITARRA
tree: <((S,91 (NP,2 (N,3:1 'MARÍA') ) (VP,47 (VP,1 (V,5:2 'JUEGA') ) (NP,3 (DET,2:3 'LA') (N,5:4 'GUITARRA') ) ) ) )>
tl: MARÍA TOCA LA GUITARRA
tree: <((S,91 (NP,2 (N,3:1 'MARÍA') ) (VP,47 (VP,1 (V,11:2 'TOCA') ) (NP,3 (DET,2:3 'LA') (N,5:4 'GUITARRA') ) ) ) )>

Even when I replicate what the RR does, first removing all the rules and then
loading the lexicon and the grammar, there seems to be no problem:

[aria@avenue eng2spa]$ transfer
Welcome to the AVENUE Transfer Engine.
> loadrules /usr0/aria/eng2spa/lexicons/simulation-lexicon-REFINED-2.trf
Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon-REFINED-2.trf with 33 rules added
> loadrules /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED-2.trf
Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED-2.trf with 15 rules added
> reloadgra /usr0/aria/eng2spa/lexicons/simulation-lexicon-REFINED-3.trf
Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon-REFINED-3.trf with 1 rules added
> clearall
Deleting all loaded rules.
> loadrules /usr0/aria/eng2spa/lexicons/simulation-lexicon-REFINED-4.trf
Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon-REFINED-4.trf with 35 rules added
> loadrules /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED-4.trf
Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED-4.trf with 15 rules added
> exit

the Xfer engine wrapper funtions for LoadGra and LoadLex were using loadgra
and loadlex instead of loadrules -> changed.

reloadrules is not a valid command, so leaving reloadgra for both reloadLex and Gra.


looking at the code that actually loads G and L:
RR.cpp: l. 1570
	pXfer->RemoveAllLoadRules();
	pXfer->LoadLex(pRefinedLexFileName);
	pXfer->LoadGra(pRefinedGraFileName);

XferEngine.cpp:

void XferEngine::LoadGra(const char* pGramFileName) {
  string Command1 = string ("loadrules ") + pGramFileName;
  xfer->processCommand(Command1);
}

void XferEngine::LoadLex(const char* pLexFileName) {
  string Command2 = string ("loadrules ") + pLexFileName;
  xfer->processCommand(Command2);
}

void XferEngine::RemoveAllLoadRules() {
  xfer->processCommand("clearall");
}

// for now they are both calling reloadgra, since reloadlex is not implemented yet
void XferEngine::ReLoadLex(const char* pRefinedLexFileName) {
  string Command3 = string ("reloadgra ") + pRefinedLexFileName;
  xfer->processCommand(Command3);
}

And it does'nt look like a formatting problem either:
[aria@avenue grammars]$ diff simulation-grammar-REFINED-2.trf simulation-grammar-REFINED-4.trf
1c1
< ;F:0
---
> ;F:1
121,131c121,132
< {VP,2}
< VP::VP : [VP NP] -> [VP NP]
< (
<   (X1::Y1)  (X2::Y2)
<   ((x2 case) = acc)
<   ((x0 obj) = x2)
<   ((x0 agr) = (x1 agr))
<   (x0 = x1)
<   ((y0 tense) = (x0 tense))
<   ((y0 agr) = (y1 agr))
< )
---
> ;D:
> ;{VP,2}
> ;VP::VP : [VP NP] -> [VP NP]
> ;(
> ;  (X1::Y1)  (X2::Y2)
> ;  ((x2 case) = acc)
> ;  ((x0 obj) = x2)
> ;  ((x0 agr) = (x1 agr))
> ;  (x0 = x1)
> ;  ((y0 tense) = (x0 tense))
> ;  ((y0 agr) = (y1 agr))
> ;)
158a160,172
> )
> {VP,47}
> VP::VP : [VP NP] -> [VP NP]
> (
> ;(P:{VP,2})
>   (X1::Y1)  (X2::Y2)
>   ((x2 case) = acc)
>   ((x0 obj) = x2)
>   ((x0 agr) = (x1 agr))
>   (x0 = x1)
>   ((y0 tense) = (x0 tense))
>   ((y0 agr) = (y1 agr))
>   ((y1 feat_0) = (y2 feat_0))


double checked parents, and they seem to be fine...

-> emailed Erik, in case he knows why this is happening


Tuesday, August 1, 2006

- renamed 3 to 1 and 1 to 3 and rerun RR, see what happens...

[aria@avenue IOFiles]$ cd 2006-5-15-13-08-25-11387/
[aria@avenue 2006-5-15-13-08-25-11387]$ mv 1 temp
[aria@avenue 2006-5-15-13-08-25-11387]$ mv 3 1
[aria@avenue 2006-5-15-13-08-25-11387]$ mv 1 3 (should have been mv temp 3) 
[aria@avenue 2006-5-15-13-08-25-11387]$ 


it worked!!!
Debugging here
Deleting all loaded rules.
Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon-REFINED-4.trf with 35 rules added
Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED-4.trf with 15 rules added
Checking refined lattice for the presence of CTL and TL sentences
XferEngine::CheckRefinedLattice: Checking if CTL and TL sentences are in the Refined Lattice...
TL sentence is [MARÍA JUEGA LA GUITARRA]
CTL sentence is [MARÍA TOCA LA GUITARRA]
these are the alterative translations for Mary plays the guitar :
tl-0: MARÍA JUEGA LA GUITARRA
tree-0: ((S,91 (NP,2 (N,3:1 'MARÍA') ) (VP,47 (VP,1 (V,5:2 'JUEGA') ) (NP,3 (DET,2:3 'LA') (N,5:4 'GUITARRA') ) ) ) )
tl-1: MARÍA TOCA LA GUITARRA
tree-1: ((S,91 (NP,2 (N,3:1 'MARÍA') ) (VP,47 (VP,1 (V,11:2 'TOCA') ) (NP,3 (DET,2:3 'LA') (N,5:4 'GUITARRA') ) ) ) )

****************************************************************************
***The refined grammar and lexicon produced the user corrected translation***
		The correct translation is: MARÍA TOCA LA GUITARRA
****************************************************************************

****************************************************************************
***However it is still producing the incorrect translation, previously corrected***
		by the user: MARÍA JUEGA LA GUITARRA
****************************************************************************


*********************************************************
Refinement was successfull, but lexical ambiguity increased
*************************************************************

And it's working for:

0. I see the red car
2. I see the red unicorn
3. Mary plays the guitar
4. John and Mary fell

oops, deleted 1 by mistake (she read - ella leyi -> leyo)
 -> need to implement percolate first anyway

but not for:

5. you saw the woman

"fell" still seems to be instantiated to the SLWord, and so it created
a lex entry fell->a!!!

**************************************************
After all actions in CI:
SLWords: John and Mary fell 
TempCTLWords (1st time = TLWords): juan y marÍa se cayeron 
Alignments: ((1,1),(2,2),(3,3),(4,5),(4,4))
**************************************************

((S,0 
(VP,46 
(V,4:2 'VISTE')
(NP,3 
(DET,2:3 'LA')
(N,4:4 'MUJER'))))

MAIN::CI's CTL sentence is instantiated with [viste a la mujer ]
XferEngine::TLInLattice: Checking if TL sentence is in the Lattice...
TL sentence is [VISTE A LA MUJER]
MAIN::pXfer->TLInLattice: no, the CTL sentence is NOT in the lattice

Affected Rule are: 2
{VP,46} ['VISTE']
{NP,3} ['LA']


Before action: 
SLWords: you saw the woman 
TempCTLWords (1st time = TLWords): viste la mujer 
Alignments: ((2,1),(3,2),(4,3))

Action type = add
Wi': [a]
i'...: 2
Looking ahead to find relevant alignments...
Other action type: 5
Discarding for now.
End of look ahead

Applying correction action to TLWord
SLWordPos is 4
 If pLexEntry != NULL
{V,13}
V::V |: ["fell"] -> ["a"]
(
;(P:{V,6})
  (X1::Y1)
  ((x0 form) = fall)
  ((x0 actform) = fell)
  ((x0 tense) = past)
  ((y0 agr pers) = 3)
  ((y0 agr num) = pl)
  ((y0 type) = refl)
)
*********************************
...
Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon-REFINED-6.trf with 1 rules added
Segmentation fault

(the run before it seg faulted after having loaded both grammar and lexicon and
it seg faulted after printing the tree for "me gustaria que ir"


-> there is a bug in the code that adds a word at the grammar level following
another add case at the lexical level...
+ se: fell -> se cayeron
+ a -> no alignments -> no SLword!!!

debugging alignment code (look ahead, etc.):

Added to l. 572: SLWordPos = -1;
Changed l. 626 from: if ( SLWordPos > 0 ) 
	       to: if ( SLWordPos >= 0 ) 

something worked, but need to further debug
...

Looking ahead to find relevant alignments...
pAct2->GetType() is 5 (CLEAR_ALIGNMENT)
Other action type: 5
Discarding for now.
End of look ahead

%%%%%%%%%%%%%%%%%%
From CI.hpp:
enum ACTION_TYPE
{
	ADD = 0,
	EDIT,//1
	DELETE, //2
	CHANGE_WORD_ORDER, //3  
	ADD_ALIGNMENT, //4
	CLEAR_ALIGNMENT //5
};
%%%%%%%%%%%%%%%%%%%

Applying correction action to TLWord
SLWordPos is -1
There are no alignments from Wi' to SLWords -> Grammar Refinement
There are  2 Affected Rules
{VP,46} ['VISTE']
{NP,3} ['LA']
Original Rule is:
{NP,3}
NP::NP : [DET N] -> [DET N]
(
  (X1::Y1)  (X2::Y2)
  (x0 = x2)
  ((y1 def) = (x1 def))
  ((y2 agr) = (x2 agr))
  ((y1 agr) = (y2 agr))
)
Bifurcated {NP,3}
the position where "a" needs to be added to is -1
{NP,10}
NP::NP : [DET N] -> [a DET N]
(
;(P:{NP,3})
  (X1::Y2)  (X2::Y3)
  (x0 = x2)
  ((y1 def) = (x1 def))
  ((y2 agr) = (x2 agr))
  ((y1 agr) = (y2 agr))
)

Added new rule to grammar...
Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED-6.trf with 1 rules added


but now is seg faulting after having printed the tree for "viste la mujer" :(

((S,0 
(VP,46 
(V,4:2 'VISTE')
(NP,3 
(DET,2:3 'LA')
(N,4:4 'MUJER'))))

Tried processing log file 5 before 4:

[aria@avenue 2006-5-15-13-08-25-11387]$ mv 4 temp
[aria@avenue 2006-5-15-13-08-25-11387]$ mv 5 4
[aria@avenue 2006-5-15-13-08-25-11387]$ mv temp 5
[aria@avenue 2006-5-15-13-08-25-11387]$ 

the RR processes file not in alha-numerical order!!!
0,4,2,3,5,6,7,9
not sure why  -> Bill?


There seems to be bug in RuleInstantiation...

	    int iPosAdded;
	    // last argument indicates that added word is not already in tree
	    pNewGraRule->RuleInstantiation(TLWordPos, &tree, iPosAdded, false);
	    cout << "the position where \"" << TLWord 
		 << "\" needs to be added to is " << iPosAdded << endl;
...
Applying correction action to TLWord
If there is at least one alignment to Wi (SLWordPos >= 0)
SLWordPos is -1
There are no alignments from Wi' to SLWords -> Grammar Refinement
There are  2 Affected Rules
{VP,46} ['VISTE']
{NP,3} ['LA']
Original Rule is:
{NP,3}
NP::NP : [DET N] -> [DET N]
(
  (X1::Y1)  (X2::Y2)
  (x0 = x2)
  ((y1 def) = (x1 def))
  ((y2 agr) = (x2 agr))
  ((y1 agr) = (y2 agr))
)
Bifurcated {NP,3}
the position where "a" needs to be added to is -1
{NP,10}
NP::NP : [DET N] -> [a DET N]
(
;(P:{NP,3})
  (X1::Y2)  (X2::Y3)
  (x0 = x2)
  ((y1 def) = (x1 def))
  ((y2 agr) = (x2 agr))
  ((y1 agr) = (y2 agr))
)

Added new rule to grammar...
Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED-2.trf with 1 rules added
Checking refined lattice for the presence of CTL and TL sentences
XferEngine::CheckRefinedLattice: Checking if CTL and TL sentences are in the Refined Lattice...
TL sentence is [VISTE LA MUJER]
CTL sentence is [VISTE A LA MUJER]
these are the alterative translations for you saw the woman :
tl-0: VISTE LA MUJER
tree-0: ((S,0 (VP,46 (V,4:2 'VISTE') (NP,3 (DET,2:3 'LA') (N,4:4 'MUJER') ) ) ) )
tl-1: TÚ VISTE LA MUJER
tree-1: ((S,1 (NP,1 (PRON,2:1 'TÚ') ) (VP,46 (V,4:2 'VISTE') (NP,3 (DET,2:3 'LA') (N,4:4 'MUJER') ) ) ) )
tl-2: VISTE LA MUJER
tree-2: ((S,0 (VP,2 (VP,1 (V,4:2 'VISTE') ) (NP,3 (DET,2:3 'LA') (N,4:4 'MUJER') ) ) ) )
tl-3: TÚ VISTE LA MUJER
tree-3: ((S,1 (NP,1 (PRON,2:1 'TÚ') ) (VP,2 (VP,1 (V,4:2 'VISTE') ) (NP,3 (DET,2:3 'LA') (N,4:4 'MUJER') ) ) ) )
tl-4: TÚ VISTE LA MUJER
tree-4: ((S,90 (NP,1 (PRON,2:1 'TÚ') ) (VP,46 (V,4:2 'VISTE') (NP,3 (DET,2:3 'LA') (N,4:4 'MUJER') ) ) ) )
tl-5: TÚ VISTE LA MUJER
tree-5: ((S,90 (NP,1 (PRON,2:1 'TÚ') ) (VP,2 (VP,1 (V,4:2 'VISTE') ) (NP,3 (DET,2:3 'LA') (N,4:4 'MUJER') ) ) ) )

***********************************************************
Refinement did not work, need to revise manually
*************************************************************

But now it seg faults on a different log file (when loading G and L for Mary
plays the guitar). 
It crashes after having processed the 4th refinement


Wednesday, August 2, 2006

- there seems to be a problem with the max number of log files that the RR can
process before it seg faults... could this be a memory problem?
	-> try running on barrow

Needed to change permissions on avenue to make partition /usr0 writable as 
well as readable (Ralf).

[aria@barrow ~]$ /avenue/usr0/aria/RuleRefinement/V0.07/RuleRefinement > ! /avenue/usr0/aria/RuleRefinement/V0.07/RR.out

However, it does't seem to finish by itself and it's only outputting the beginning:
Compiled on Aug  1 2006 15:28:57 with g++ version: 3.2.2 20030222 (Red Hat Linux 3.2.2-5) in debug mode
Parameters are:
Debug Level     = 2
Lexicon File    = /usr0/aria/eng2spa/lexicons/simulation-lexicon.trf
Grammar File    = /usr0/aria/eng2spa/grammars/simulation-grammar.trf

weird, since it's supposed to be faster, not slower...

Ralf thinks that because barrow has a 64-bit processor instead of a 32-bit one,
that might be causing some problems, and it is hard to predict why it is not
running well on barrow, but there could be a number of things.

- tried adding getchar() in my code so that the program stops before crashing
and doesn't continue until I type in some character.

*******************************************************************
- Debugging session with Ralf:

	Makefile: edited Makefile and added the -ggdb3 to the DEBUG line for
	********  full debugging
	DEBUG =  -g -ggdb3 # outputs just serious errors


	GDB: started gdb (run, break function_name | file_name: line-num, up, 
	**** down, print, ...). 
	Open emacs, 
	M-x gdb [enter] 
	executable_name (RuleRefinement)[enter]
	run
	up
	print CTLS

	VALGRIND:
	******** 
	downloaded, compiled and installed valgrind, which is a 
	program to check memory leaks (/usr0/aria/RuleRefinement/bin)
	added it to the V0.07 path 	
	[aria@avenue V0.07]$ ../bin/valgrind-3.2.0/coregrind/valgrind --leak-check=full --show-reachable=yes ./RuleRefinement > & ! OutputValgrind
	
	TAGS
	****
	M-x visit-tags-table
	M-x find-tag: FileName|FuntionName
*******************************************************************


- Ralf thinks that GetCI is not properly instantiated and it tries 
to access a NULL pointer, not sure why... need to look into it.


CorrectionInstance *CIVector::GetCI(int i) const
{
	if (i >= 0 && i < size())
		return (*this)[i];
	else
		return NULL;
}

gdb:
...
 break 'CorrectionInstance::GetCTLSentence()'
Function "CorrectionInstance::GetCTLSentence()" not defined.
(gdb) up
#1  0x400acf4c in std::string::string(std::string const&) ()
   from /usr/lib/libstdc++.so.5
(gdb) up
#2  0x080622dc in CorrectionInstance::GetCTLSentence() const (this=0x826c130)
    at CorrectionInstance.cpp:1664
(gdb) 
#3  0x0804eccb in main (argc=1, argv=0xbffff974) at RuleRefinement.cpp:338
warning: Source file is more recent than executable.

(gdb) down
#2  0x080622dc in CorrectionInstance::GetCTLSentence() const (this=0x826c130)
    at CorrectionInstance.cpp:1664
(gdb) print *this
$1 = {<RefCountedObject> = {_vptr.RefCountedObject = 0x8270b18, m_cRef = 0}, 
  Parse = {static npos = 4294967295, 
    _M_dataplus = {<allocator<char>> = {<No data fields>}, 
      _M_p = 0x826c130 "\030\v'\b"}, static _S_empty_rep_storage = {0, 0, 0, 
      0}}, m_tree = {Tokenize = {<No data fields>}, Root = 0x8280e64, 
    m_CStructure = {static npos = 4294967295, 
      _M_dataplus = {<allocator<char>> = {<No data fields>}, 
        _M_p = 0x826c8c0 "(\204$\b"}, static _S_empty_rep_storage = {0, 0, 0, 
        0}}, 
    m_vLeaves = {<_Vector_base<TreeNode*,std::allocator<TreeNode*> >> = {<_Vector_alloc_base<TreeNode*,std::allocator<TreeNode*>,true>> = {_M_start = 0x0, 
          _M_finish = 0x52, 
          _M_end_of_storage = 0x12}, <No data fields>}, <No data fields>}}, 
  SLS = {static npos = 4294967295, 
    _M_dataplus = {<allocator<char>> = {<No data fields>}, 
      _M_p = 0x12 <Address 0x12 out of bounds>}, 
    static _S_empty_rep_storage = {0, 0, 0, 0}}, 
  SLWords = {<_Vector_base<Word,std::allocator<Word> >> = {<_Vector_alloc_base<Word,std::allocator<Word>,true>> = {_M_start = 0x2, _M_finish = 0x20756f79, 
        _M_end_of_storage = 0x20776173}, <No data fields>}, <No data fields>}, 
  TLS = {static npos = 4294967295, 
    _M_dataplus = {<allocator<char>> = {<No data fields>}, 
      _M_p = 0x20656874 <Address 0x20656874 out of bounds>}, 
    static _S_empty_rep_storage = {0, 0, 0, 0}}, 
  TLWords = {<_Vector_base<Word,std::allocator<Word> >> = {<_Vector_alloc_base<Word,std::allocator<Word>,true>> = {_M_start = 0x616d6f77, _M_finish = 0x206e, 
        _M_end_of_storage = 0x826fd10}, <No data fields>}, <No data fields>}, 
  CTLS = {static npos = 4294967295, 
    _M_dataplus = {<allocator<char>> = {<No data fields>}, _M_p = 0x0}, 
    static _S_empty_rep_storage = {0, 0, 0, 0}}, 
  CTLWords = {<_Vector_base<Word,std::allocator<Word> >> = {<_Vector_alloc_base<Word,std::allocator<Word>,true>> = {_M_start = 0x826c170, 
        _M_finish = 0x826c170, 
        _M_end_of_storage = 0x827d7fc}, <No data fields>}, <No data fields>}, 
  Actions = {<_Vector_base<Action*,std::allocator<Action*> >> = {<_Vector_alloc_base<Action*,std::allocator<Action*>,true>> = {_M_start = 0x827c400, 
        _M_finish = 0x0, 
        _M_end_of_storage = 0x0}, <No data fields>}, <No data fields>}, 
  m_IDs = {<_Vector_base<std::basic_string<char, std::char_traits<char>, std::allocator<char> >,std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >> = {<_Vector_alloc_base<std::basic_string<char, std::char_traits<char>, std::allocator<char> >,std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >,true>> = {
        _M_start = 0x826b670, _M_finish = 0x0, 
        _M_end_of_storage = 0x826c190}, <No data fields>}, <No data fields>}, 
  m_fLeadToRefinement = 144, m_fIncreasedMTAccuracy = 193, 
  m_fContainsDependentErrors = 38, m_fCDEDirty = 8, 
  m_cNumNonAlignActions = 136775572}
(gdb)     _M_dataplus = {<allocator<char>> = {<No data fields>}, _M_p = 0x0}, 
Undefined command: "".  Try "help".


Thursday, August 3, 2006

- created a file with this info/Debugging.txt

- gdb is crashing after returning CTLS (CorrectionInstance::GetCTLSentence())

string CorrectionInstance::GetCTLSentence() const {
  return CTLS;
}

the interesting thing, is that before that it seems to have entered a loop for 
TempCTLWords (1st time = TLWords): ME GUSTARÍA QUE IR:

(gdb): run
...
Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon-REFINED-6.trf with 1 rules added
Checking refined lattice for the presence of CTL and TL sentences
XferEngine::CheckRefinedLattice: Checking if CTL and TL sentences are in the Refined Lattice...
TL sentence is [JUAN Y MARÍA CAYERON]
CTL sentence is [JUAN Y MARÍA SE CAYERON]
these are the alterative translations for John and Mary fell :
tl-0: JUAN Y MARÍA CAYERON
tree-0: ((S,1 (NP,6 (NP,2 (N,2:1 'JUAN') ) (CONJ,1:2 'Y') (NP,2 (N,3:3 'MARÍA') ) ) (VP,1 (V,6:4 'CAYERON') ) ) )
tl-1: JUAN Y MARÍA SE CAYERON
tree-1: ((S,1 (NP,6 (NP,2 (N,2:1 'JUAN') ) (CONJ,1:2 'Y') (NP,2 (N,3:3 'MARÍA') ) ) (VP,1 (V,12:4 'SE CAYERON') ) ) )
tl-2: JUAN Y MARÍA CAYERON
tree-2: ((S,90 (NP,6 (NP,2 (N,2:1 'JUAN') ) (CONJ,1:2 'Y') (NP,2 (N,3:3 'MARÍA') ) ) (VP,1 (V,6:4 'CAYERON') ) ) )
tl-3: JUAN Y MARÍA SE CAYERON
tree-3: ((S,90 (NP,6 (NP,2 (N,2:1 'JUAN') ) (CONJ,1:2 'Y') (NP,2 (N,3:3 'MARÍA') ) ) (VP,1 (V,12:4 'SE CAYERON') ) ) )

****************************************************************************
***The refined grammar and lexicon produced the user corrected translation***
		The correct translation is: JUAN Y MARÍA SE CAYERON
****************************************************************************

****************************************************************************
***However it is still producing the incorrect translation, previously corrected***
		by the user: JUAN Y MARÍA CAYERON
****************************************************************************


*********************************************************
Refinement was successfull, but lexical ambiguity increased
*************************************************************


Affected Rule are: 0


Before action: 
SLWords: John and Mary fell 
TempCTLWords (1st time = TLWords): juan y marÍa se cayeron 
Alignments: ((1,1),(2,2),(3,3),(4,5))


**************************************************
After all actions in CI:
SLWords: John and Mary fell 
TempCTLWords (1st time = TLWords): juan y marÍa se cayeron 
Alignments: ((1,1),(2,2),(3,3),(4,5),(4,4))
**************************************************
MAIN::CI's CTL sentence is instantiated with [ellos ven agua ]
XferEngine::TLInLattice: Checking if TL sentence is in the Lattice...
TL sentence is [ELLOS VEN AGUA]

****************************************************************************
***This translation: ELLOS VEN AGUA is being generated by the current system.
**************************************************************************
MAIN::pXfer->TLInLattice: yes the CTL sentence is in the lattice

MAIN::However, let's see if the RR module can make the grammar tighter, by not generating
 the incorrect translation (TL) moving on to refining it...
****************************************************************************


**************************************************
After all actions in CI:
SLWords: they see water 
TempCTLWords (1st time = TLWords): ellos ven agua 
Alignments: ((1,1),(2,2),(3,3))
**************************************************
MAIN::CI's CTL sentence is instantiated with [ME GUSTARÍA IR ]
XferEngine::TLInLattice: Checking if TL sentence is in the Lattice...
TL sentence is [ME GUSTARÍA IR]
MAIN::pXfer->TLInLattice: no, the CTL sentence is NOT in the lattice

Affected Rule are: 0


Before action: 
SLWords: I would like to go 
TempCTLWords (1st time = TLWords): ME GUSTARÍA QUE IR 
Alignments: ((2,1),(4,2),(5,3))

Affected Rule are: 0


Before action: 
SLWords: I would like to go 
TempCTLWords (1st time = TLWords): ME GUSTARÍA QUE IR 
Alignments: ((1,1),(2,1),(4,2),(5,3))

Affected Rule are: 1
{PP,2} ['QUE']


Before action: 
SLWords: I would like to go 
TempCTLWords (1st time = TLWords): ME GUSTARÍA QUE IR 
Alignments: ((1,1),(2,1),(4,2),(5,3))

Action type = delete
Wi: [QUE]
i is 3 

- double checked that there is no repeated log file for "Mary and John fell"

- made all Affected Rule comments more specific, so that I know which one
is being printed when

Order of storing log files: 0,4,2,3,5,6,7,9
(not sure why the dir traverser does it in this order, check again with a new dir)
0:[I see the red car] -- [veo el auto roja]
4:[you saw the woman] -- [viste la mujer]
2:[I see the red unicorn] -- [veo el unicorn rojo]
3:[Mary plays the guitar] -- [marÍa juega la guitarra]
5:[John and Mary fell] -- [juan y marÍa cayeron]
6:[they see water] -- [ellos ven agua]
7:[I would like to go] -- [ME GUSTARÍA QUE IR]
9:[Gaudi was a great artist] -- [gaudÍ era un artista gran]


Maybe problem is caused by the fact that TLS is in CAPS...
[I would like to go] -- [ME GUSTARÍA QUE IR]


parenthesis: why some of the log files seem to be processed twice?
------------------------------------------------
Final Sentences: 
* Source Language Sentence: "you saw the 
woman" 

* Target Language Sentence: "viste a la mujer"

* Alignments: 
* "you" to ""
* "saw" to "viste"
* "the" to "la"
* "woman" to "mujer"

------------------------------------------------
Final Sentences: 
* Source Language Sentence: "you saw the 
woman" 

* Target Language Sentence: "viste a la mujer"

* Alignments: 
* "you" to ""
* "saw" to "viste"
* "the" to "la"
* "woman" to "mujer"

------------------------------------------------

-> maybe it treats them like new actions... need to double check:

MAIN::CI's CTL sentence is instantiated with [viste a la mujer ]
XferEngine::TLInLattice: Checking if TL sentence is in the Lattice...
TL sentence is [VISTE A LA MUJER]
MAIN::pXfer->TLInLattice: no, the CTL sentence is NOT in the lattice

Starting new loop through the Actions in CI::Affected Rule are: 2
{VP,46} ['VISTE']
{NP,3} ['LA']
...
tl-5: TÚ VISTE LA MUJER
tree-5: ((S,90 (NP,1 (PRON,2:1 'TÚ') ) (VP,2 (VP,1 (V,4:2 'VISTE') ) (NP,3 (DET,2:3 'LA') (N,4:4 'MUJER') ) ) ) )

***********************************************************
Refinement did not work, need to revise manually
*************************************************************

DEBUG: Add case done
Starting new loop through the Actions in CI::Affected Rule are: 0


Before action: 
SLWords: you saw the woman 
TempCTLWords (1st time = TLWords): viste a la mujer 
Alignments: ((2,1),(3,3),(4,4))


**************************************************
After all actions in CI:
SLWords: you saw the woman 
TempCTLWords (1st time = TLWords): viste a la mujer 
Alignments: ((2,1),(3,3),(4,4))
**************************************************

But 5 only has one time and it also outputs it again at the end:


		by the user: JUAN Y MARÍA CAYERON
****************************************************************************


*********************************************************
Refinement was successfull, but lexical ambiguity increased
*************************************************************


Starting new loop through the Actions in CI::Affected Rule are: 0


Before action: 
SLWords: John and Mary fell 
TempCTLWords (1st time = TLWords): juan y marÍa se cayeron 
Alignments: ((1,1),(2,2),(3,3),(4,5))


**************************************************
After all actions in CI:
SLWords: John and Mary fell 
TempCTLWords (1st time = TLWords): juan y marÍa se cayeron 
Alignments: ((1,1),(2,2),(3,3),(4,5),(4,4))
**************************************************

Looking at the log file 7 ("I would like to go" -- "ME GUSTARÍ QUE IR"), 
the times when it prints before and after info, is due to alignment changes

-> add print statement saying that an alignment has been added, so that 
I know it's not a bug!

- program is running until it segs fault for the last log file (9 Gaudi),
since cwo is still being implemented, so it's not that bad!!!


August 14, 2006

- backed up to temuco and Avenue afs
from temuco (temuco:/usr4/aria/RuleRefinement) and
/afs/cs.cmu.edu/project/avenue-1/Avenue/RuleRefinement:
cp -uR /usr0/aria/RuleRefinement/* .

- started expanding the G and the L


August 21, 2006

- done expanding G and L, now refining expanded G and L...

In /avenue/usr0/aria/RuleRefinement/V0.07
RuleRefinement > ! RR.out.InitialResults

RR.out.InitialResults
Compiled on Aug  3 2006 17:56:50 with g++ version: 3.2.2 20030222 (Red Hat Linux 3.2.2-5) in debug mode
Parameters are:
Debug Level     = 2
Lexicon File    = /usr0/aria/eng2spa/lexicons/simulation-lexicon.trf
Grammar File    = /usr0/aria/eng2spa/grammars/simulation-grammar.trf
***************************************************************
StartXfer::initfile is /usr0/aria/eng2spa/auto-init.txt
Turning on Latin-1 mode
Setting normalizecase to UPPER-CASE
Setting find all translations to ON
Setting output source text to ON
Setting showtrace to full trace with src indices
Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-ID.trf with 19 rules added
Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon-ID.trf with 227 rules added
MAIN::Adding from directory : /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08-25-11387
...


August 30, 2006

- Get results for running RR one log file at a time
RR + following log files
Order of storing log files: 0,4,2,3,5,6,7,9
(not sure why the dir traverser does it in this order, check again with a new dir)
0:[I see the red car] -- [veo el auto roja]
simulation-lexicon-REFINED-1.trf (same as simulation-lexicon.trf)
simulation-grammar-REFINED-1.trf (NP,8, N-ADJ agreement contraint added)

4:[you saw the woman] -- [viste la mujer]
simulation-lexicon-REFINED-1.trf
simulation-grammar-REFINED-2.trf (NP,10: "a" added; for now added "" manually) 

2:[I see the red unicorn] -- [veo el unicorn rojo]
simulation-lexicon-REFINED-4.trf [+unicornio; added default features mannually]
simulation-grammar-REFINED-2.trf

3:[Mary plays the guitar] -- [marÍa juega la guitarra]
simulation-lexicon-REFINED-5.trf [add feature value contraints to lex entries juega + toca + guitarra]
simulation-grammar-REFINED-5.trf [VP,2 -> VP,47: added VP-NP feat constraint]

5:[John and Mary fell] -- [juan y marÍa cayeron]
simulation-lexicon-REFINED-6.trf [+se cayeron]
simulation-grammar-REFINED-5.trf

7:[I would like to go] -- [ME GUSTARÍA QUE IR]
not implemented yet

9:[Gaudi was a great artist] -- [gaudÍ era un artista gran]
crashing (need to finish implementing)


September 7, 2006

- met with Bill. He's fixed some bugs and will be working on the remaining 
tasks this weekend. 

- moved to version V0.08 and copied RRRule.cpp and CorrectionInstance.cpp into
new dir.

- since temuco was upgraded, the corss-mounting wasn't preserved and so I got
a permission denied error message when trying to compile RR (accessing 
libraries in temuco). I also needed to update the path for antlr.

- I am getting a compiler error that I was NOT getting before, so it's unrelated
to Bill's upgrade... checked on V0.07:

[aria@avenue V0.07]$ make RR
/usr/bin/g++ -g -ggdb3  -ftemplate-depth-24 -Wno-non-virtual-dtor -DNDEBUG -o RuleRefinement.o -c RuleRefinement.cpp -I/temuco/usr5/shared/code/antlr/antlr-2.7.1/lib/cpp -I/afs/cs.cmu.edu/project/avenue-1/Avenue/Transfer/stable-linux2 -I/shared/Genkit/UKernel -I/shared/Genkit/Toolbox
RuleRefinement.cpp: In function `int main(int, char**)':
RuleRefinement.cpp:327: parse error before `*' token
RuleRefinement.cpp:330: `pCI' undeclared (first use this function)
RuleRefinement.cpp:330: (Each undeclared identifier is reported only once for 
   each function it appears in.)
make: *** [RuleRefinement.o] Error 1

fixed a punctuation error, and got some real compiler errors from Bill's code:

CorrectionInstance.cpp
#include "Lexicon.hpp" -> #include "CICollection/Lexicon.hpp"

emailed Bill to make sure Lexicon.hpp has not changed...

Debugged Bill's code:
- pt --> m_tree; 
- added GetTree() back, not sure why you took it out, and 
- PlaceWordsInVector: changed the call the right function with the right 
parameter: StringUtils::StringToLower(word.value); 

but now the compiler is complaining about the new implementation of RRRule.cpp:

[aria@avenue V0.08]$ make RR
...
RuleRefinement.o(.text+0x320b): In function `main':
/usr0/aria/RuleRefinement/V0.08/RuleRefinement.cpp:867: undefined reference to `
RRRule::RuleInstantiation(int, ParseTree*, int&, bool)'
RuleRefinement.o(.text+0x576c):/usr0/aria/RuleRefinement/V0.08/RuleRefinement.cp
p:1708: undefined reference to `RRRule::RuleInstantiation(int, ParseTree*, int&,
 bool)'
RRRule.o(.gnu.linkonce.d._ZTV18LiteralConstituent+0x10): undefined reference to
`LiteralConstituent::GetHashIndex()'
RRRule.o(.gnu.linkonce.d._ZTV14POSConstituent+0x10): undefined reference to `POS
Constituent::GetHashIndex()'
RRRuleCollection.o(.text+0xd54): In function `RRRuleCollection::AddRule(RRRule*)
':
: undefined reference to `RRRule::GetRHSConstituentHash()'
RRRuleCollection.o(.text+0x1836): In function `RRRuleCollection::GetAllGrammarRu
lesWithRHSConstituents(std::vector<RRConstituent*, std::allocator<RRConstituent*
> >&, std::vector<RRGrammarRule*, std::allocator<RRGrammarRule*> >&)':
: undefined reference to `RRRule::AreRHSConstituentsSame(std::vector<RRConstitue
nt*, std::allocator<RRConstituent*> >&)'
collect2: ld returned 1 exit status
make: *** [RR] Error 1

- emailed Bill again with list of default values for each POS and this.


Friday, September 8, 2006

- met with Bill, he had modified the incorrect RRRule.cpp file, made his 
changes again, but now the ToLower change in CI.cpp (PlaceWordsInVector, 
SetTLWords, SetSLWords, SetCTLWords) has the effect of not matching the TL 
sentence with the output of the Xfer engine... so I reverted the change.

I finally figured out what was wrong, the lexicon now contains two sets of 
quotes: (lexicons/simulation-lexicon-ID.trf)

;F:0
{N,1}
N::N |: [""car""] -> [""auto""]
(
  (X1::Y1)
  ((x0 form) = car)
  ((x0 agr pers) = 3)
  ((x0 agr num) = sg)
  ((y0 agr gen) = masc)
  ((x0 semtype) = object)
)

- emailed Bill about this.


Thursday, September 14, 2006

- met with Bill:
testing his code (lexicon bug fixed)

it's workinf now!! "will" preserves quotes in the grammar rules, and lexical
rules have just one set of quotes

- method that adds the default values for a new lexical entry given a POS
is implemented.
Since this is language dependent, Bill doesn't want to add it to the general 
method that creates a new lexical entry.

- move constit method is implemented, need to test

- bill managed to track down a bug in the GetTree method I wrote. Since the
= operator is not overloaded, when I return the tree, I make a copy by copying
all the bits, and not just the values (this is really bad!!! since I can delete
a node in instance 1, and even though instance 2 is the exact same tree, it
won't be deleted there... Bill is going to implement this method and fix
the ParseTree class, since it has memory leaks and he thinks there are many
bad things about it...


Monday, September 18, 2006

- ParseTree is not compiling since bill included Lexicon.hpp, but there is no
separate class for the lexicon...

- he didn't submit CorrectionInstance update, so GetTree is probably still
my unsafe implementation

- emailed him with these questions, met with him at 3pm

- 7: I would like to go (me gustaria que ir) does not have a full parse, so 
the RR crashes. Need to think of a delete example that can be parsed in the 
first place

[aria@avenue 2006-5-15-13-08-25-11387]$ mv 7 ../7-of-2006-5-15-13-08-25-11387

- we weren't setting m_tree to anything in LoadTctool method in 
void CorrectionInstance::LoadTCToolLogFile(const char *szLogFileName, ParseTree *pTree)

- testing previous bug fixes:

- GetTree is working now...

- Add a word (constit) to a Grule:
  - quotes added: yes (see" simulation-grammar-REFINED-5.trf)

-> change affected rules heuristic to first trying out the rule which contains
more context
ex: "a"  -> VP "a" NP instead of "a" NP -> subject!!!

  -  constraint indices updated? yes!

{NP,3}
NP::NP : [DET N] -> [DET N]
(
  (X1::Y1)  (X2::Y2)
  (x0 = x2)
  ((y1 def) = (x1 def))
  ((y2 agr) = (x2 agr))
  ((y1 agr gen) = (y2 agr gen))
  ((y1 agr num) = (y2 agr num))
)

{NP,10}
NP::NP : [DET N] -> ["a" DET N]
(
;(P:{NP,3})
  (X1::Y2)  (X2::Y3)
  (x0 = x2)
  ((y2 def) = (x1 def))
  ((y3 agr) = (x2 agr))
  ((y2 agr gen) = (y3 agr gen))
  ((y2 agr num) = (y3 agr num))
)


- Add new lexical enty (unicorn):
  - default values added? yes (in RRUle: void RRLexiconRule::SetConstraintsFromPOS(POS_TYPE POS))

added code to my main:
      pNewLexEntry = new RRLexiconRule(POS, RC.USeRuleIDManager(), vsSLside, vsCTLside);
   // adds default constraints for given POS, implemented separately
   //from Creating a new LexEntry, since it is language dependent
       pNewLexEntry->SetConstraintsFromPOS(POS);

-> the unicorn example works very well now, it doesn't add any ambiguity, due
to the lack of constraints.

- Mary plays the guitar:
even though new feature val is added to both clue word (guitarra) and 
correction word (toca), and to the grammar rule that subsumes both words,
the feature did not get percolated to intermediate levels:

	      //**** need to percolate feat up to phrase ****//

WiPOSPos is: [1] and CluePOSPos is: [2]
Bifurcated {VP,2}
Need to create agreement constraint with feat_0 and add constraint to rule
New value constraint created is (y1 feat_0) = (y2 feat_0)
Added to the rule...
{VP,5}
VP::VP : [VP NP] -> [VP NP]
(
;(P:{VP,2})
  (X1::Y1)  (X2::Y2)
  ((x2 case) = acc)
  ((x0 obj) = x2)
  ((x0 agr) = (x1 agr))
  (x0 = x1)
  ((y0 tense) = (x0 tense))
  ((y0 agr) = (y1 agr))
  ((y1 feat_0) = (y2 feat_0))
)

also need to refine VP and NP (look for the right instances of NP and VP in the tree)

This is the example that is illustrated in detail in the instructions I sent Bill

Tuesday, September 19, 2006

- gaudi was a great artist:
is extacting commented rule!!! -> need to restrict that


Wednesday, September 20, 2006

this is a problem that arises when processing more than one logfile at once. 
Since rules might have been modified already, but the translation tree
is stored at the beginning when all the logfiles get stored...

Think about changing the logic of the program so that 
it actually processes one file and then makes a correction, 
Then stores the next file (will get the right trace, translation tree, 
with the new, refined rules, and not the old ones)
 
Alternatively, the pAct->GetAffectedRuleAndLexID(j, ID, lex);
Can doublecheck if that ID is not active, and if so, look in the rule 
hierarchy to see which one is, and extract that one instead.

Need to ask bill to implement a new method that will pick the grammar rule
with the most context (most specific) to avoid over-generalization.

Before sending him another email with this, test move constit and everything
else.

- move constit in RHS (oldpos, newpos) is implemented, and tested
in RRRule:
void MoveConstituentInRHS(int iOldPos, int iNewPos);

Both constraint indices are correctly updated :-)

- move to next version and change the logic in which I store and process logfiles
***************
**** V0.09 ****
***************

backed up RR dir to temuco and Avenue-afs (needed to remove old directories, 
since otherwise it exceeded disk quota!)
So from now on, when doing backup s, i'll need to only copy over modified 
directories, otherwise, all the old dirs will also get backedup on Avenue-afs

****************************************************************************

- Actually, I can't process files as I go, since I want to be able to 
compare them, eliminate duplicates and rank them so that I process logiles
that are simpler first.

-> need to use rule hierarchy in order to retrieve only active rules

- emailed bill: Code fixes (affected rules need to be active +  pick most specific rule) 


*** delete case:
...
ME GUSTAR�A QUE IR 
tree: <((S,0 (VP,1 (VB,1 (V,10:2 'ME GUSTAR�A') ) ) ) )> <(PREP,1:4 'QUE')> <(V,11:5 'IR')> 


!!!Attention: Tree is empty!!!!

MAIN::Printing Tree extracted from Xfer engine for SL-TL pair in the logfile:
NULL subroot

-> seg fault

Need to think of an example that will parse (the unicorn example parses 
since the Xfer engine robuts future is on and can skip OOV nouns and verbs.


Thursday, September 21, 2006

- left for Colombia


Thursday, September 28, 2006

- met with bill to discuss percolate method, he understands it now.
he implemented a method to access the active rule, as opposed to the old
affected rule, which might have changed.

- fixed code so that it doesn't seg fault any more!!!
the while loop over the logfiles kept going after there were no more log files 
to process -> changed with C->NumCIs and it's working now :)

- added Xfer code at the end of CWO case:
Action type = cwo
Wi: [gran]
i: 5
i': 4
Wmoved: [artista]
i-i'= 1
Affected Rule are: 1
{NP,8} [gran]
CWO: Making sure it's an active rule in the R Hierarchy
got active rule
{NP,9}
NP::NP : [DET ADJ N] -> [DET N ADJ]
(
;(P:{NP,8})
  (X1::Y1)  (X2::Y3)  (X3::Y2)
  ((x0 det) = x1)
  ((x0 mod) = x2)
  (x0 = x3)
  (y0 = x0)
  (y1 == (y0 det))
  (y3 == (y0 mod))
  (y2 = y0)
  ((y1 agr num) = (y2 agr num))
  ((y1 agr gen) = (y2 agr gen))
  ((y3 agr gen) = (y2 agr gen))
)
Bifrucated {NP,8}
New rule is :
{NP,11}
NP::NP : [DET ADJ N] -> [DET N ADJ]
(
;(P:{NP,9})
  (X1::Y1)  (X2::Y3)  (X3::Y2)
  ((x0 det) = x1)
  ((x0 mod) = x2)
  (x0 = x3)
  (y0 = x0)
  (y1 == (y0 det))
  (y3 == (y0 mod))
  (y2 = y0)
  ((y1 agr num) = (y2 agr num))
  ((y1 agr gen) = (y2 agr gen))
  ((y3 agr gen) = (y2 agr gen))
)
iPOSPos (MoveFromPos) is: 3
MoveToPos is: 2
Constituent in 3 has been moved to 2 in the refined rule:
{NP,11}
NP::NP : [DET ADJ N] -> [DET ADJ N]
(
;(P:{NP,9})
  (X1::Y1)  (X2::Y2)  (X3::Y3)
  ((x0 det) = x1)
  ((x0 mod) = x2)
  (x0 = x3)
  (y0 = x0)
  (y1 == (y0 det))
  (y2 == (y0 mod))
  (y3 = y0)
  ((y1 agr num) = (y3 agr num))
  ((y1 agr gen) = (y3 agr gen))
  ((y2 agr gen) = (y3 agr gen))
)
Deleting all loaded rules.
Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon-REFINED-8.trf with 233 rules added
Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED-8.trf with 19 rules added
Checking refined lattice for the presence of CTL and TL sentences
XferEngine::CheckRefinedLattice: Checking if CTL and TL sentences are in the Refined Lattice...
TL sentence is [GAUD� ERA UN ARTISTA GRAN]
CTL sentence is [GAUD� ERA UN GRAN ARTISTA]
these are the alterative translations for Gaudi was a great artist :
tl-0: GAUD� ERA UN ARTISTA GRANDE
tree-0: ((S,1 (NP,2 (N,7:1 'GAUD�') ) (VP,5 (VP,1 (VB,2 (AUX,1:2 'ERA') ) ) (NP,9 (DET,3:3 'UN') (N,8:5 'ARTISTA') (ADJ,3:4 'GRANDE') ) ) ) )
tl-1: GAUD� ERA UNA ARTISTA GRANDE
tree-1: ((S,1 (NP,2 (N,7:1 'GAUD�') ) (VP,5 (VP,1 (VB,2 (AUX,1:2 'ERA') ) ) (NP,9 (DET,31:3 'UNA') (N,8:5 'ARTISTA') (ADJ,3:4 'GRANDE') ) ) ) )
tl-2: GAUD� ERA UN ARTISTA GRAN
tree-2: ((S,1 (NP,2 (N,7:1 'GAUD�') ) (VP,5 (VP,1 (VB,2 (AUX,1:2 'ERA') ) ) (NP,9 (DET,3:3 'UN') (N,8:5 'ARTISTA') (ADJ,4:4 'GRAN') ) ) ) )
tl-3: GAUD� ERA UNA ARTISTA GRAN
tree-3: ((S,1 (NP,2 (N,7:1 'GAUD�') ) (VP,5 (VP,1 (VB,2 (AUX,1:2 'ERA') ) ) (NP,9 (DET,31:3 'UNA') (N,8:5 'ARTISTA') (ADJ,4:4 'GRAN') ) ) ) )
tl-4: GAUD� ESTABA UN ARTISTA GRANDE
tree-4: ((S,1 (NP,2 (N,7:1 'GAUD�') ) (VP,5 (VP,1 (VB,2 (AUX,2:2 'ESTABA') ) ) (NP,9 (DET,3:3 'UN') (N,8:5 'ARTISTA') (ADJ,3:4 'GRANDE') ) ) ) )
tl-5: GAUD� ESTABA UNA ARTISTA GRANDE
tree-5: ((S,1 (NP,2 (N,7:1 'GAUD�') ) (VP,5 (VP,1 (VB,2 (AUX,2:2 'ESTABA') ) ) (NP,9 (DET,31:3 'UNA') (N,8:5 'ARTISTA') (ADJ,3:4 'GRANDE') ) ) ) )
tl-6: GAUD� ESTABA UN ARTISTA GRAN
tree-6: ((S,1 (NP,2 (N,7:1 'GAUD�') ) (VP,5 (VP,1 (VB,2 (AUX,2:2 'ESTABA') ) ) (NP,9 (DET,3:3 'UN') (N,8:5 'ARTISTA') (ADJ,4:4 'GRAN') ) ) ) )
tl-7: GAUD� ESTABA UNA ARTISTA GRAN
tree-7: ((S,1 (NP,2 (N,7:1 'GAUD�') ) (VP,5 (VP,1 (VB,2 (AUX,2:2 'ESTABA') ) ) (NP,9 (DET,31:3 'UNA') (N,8:5 'ARTISTA') (ADJ,4:4 'GRAN') ) ) ) )

***********************************************************
Refinement did not work, need to revise manually
*************************************************************

for some reason NP,9 is the only one firing, NP,11 never does...
-> need to look into it

I think I need to add the new rule to the grammar!!

Friday, September 29, 2006

- added new rule to RuleCollection (RC), run again
last example (Gaudi era un gran artista) is working!!! :-))))

- worked with bill to debug and test percolate.
It finally seems to be working, however the wrong translation is still being
produced by the xfer engine... :(
looking into it...
all the rules in the translation tree seem to be correcly labelled with feat_0
and the lexical entries also, so I have no idea why this is not ruling out

tl-2: MAR�A JUEGA LA GUITARRA
tree-2: ((S,1 (NP,2 (N,3:1 'MAR�A') ) 

->	      (VP,5 (VP,6 (VB,5 (V,8:2 'JUEGA') ) ) 
		    (NP,11 (DET,2:3 'LA') (N,5:4 'GUITARRA') ) ) ) )

V,8 has:  ((y0 feat_0) = -)
VB,5 has: ((y0 feat_0) = (y1 feat_0))
VP,6 has: ((y0 feat_0) = (y1 feat_0))

N,5 has:   ((y0 feat_0) = +)
NP,11 has: ((y0 feat_0) = (y2 feat_0))

and VP,5 has ((y1 feat_0) = (y2 feat_0))

weird... look at it when I am less tired

-> isolate rules and run Xfer engine (outside RR) on just that sentence
will do this after METIS paper deadline


Saturday, September 30, 2006

Evaluation on Diagnostic Test:
run both initial and final grammar and saw that since I am not constraining 
the pre-nominal NP rule, ambiguity is increased by a lot
-> need to extract SL-TL lexical entry, add value constraint and 
add value contraint to grammar rule

do this for NAACL paper, for now, just get preliminary results 


**************
Oct. 5-7: GHC
**************
Monday, October 9, 2006

- rerun RR, but now I get a segfault, and I have no idea why:
...
tree-2: ((S,1 (NP,1 (PRON,6:1 'ELLAS') ) (VP,2 (VP,1 (VB,1 (V,3:2 'VE') ) ) (NP,2 (N,6:3 'AGUA') ) ) ) )
No alternative matches the TL sentence: ELLOS VEN AGUA


!!!Attention: Tree is empty!!!!

MAIN::Printing Tree extracted from Xfer engine for SL-TL pair in the logfile:
NULL subroot

I thought it was because I moved the Gaudi log file back into the dir, 
but did not change its name to just 9:

[aria@avenue 2006-5-15-13-08-25-11387]$ mv ../9-of-2006-5-15-13-08-25-11387 .
[aria@avenue 2006-5-15-13-08-25-11387]$ ls
0  2  3  4  5  6  9-of-2006-5-15-13-08-25-11387
[aria@avenue 2006-5-15-13-08-25-11387]$ mv 9-of-2006-5-15-13-08-25-11387 9

but that didn't fix it, only when I moved 6 out of the dir, did it work:
[aria@avenue 2006-5-15-13-08-25-11387]$ mv 6 ../6-of-2006-5-15-13-08-25-11387
[aria@avenue 2006-5-15-13-08-25-11387]$ ls
0  2  3  4  5  9

weird...


- Added "would like to go" log file (7) back to the dir the RR traverses:
[aria@avenue 2006-5-15-13-08-25-11387]$ mv ../7-of-2006-5-15-13-08-25-11387 .
[aria@avenue 2006-5-15-13-08-25-11387]$ ls
0  2  3  4  5  7-of-2006-5-15-13-08-25-11387  9
[aria@avenue 2006-5-15-13-08-25-11387]$ mv 7-of-2006-5-15-13-08-25-11387 7

Added rule so that the delete example parses for now, later think of a good 
example whose correction will generalize


Tuesday, October 10, 2006

- debugging delete case:

// 1 /////////////////////////////////////////////////////////////////////////////
	  // extract SLWordPos and Word from alignment info	  
	  // extract (multiple) alignment(s) from TLword to SLWords
	  for (int p = 0; p < TLWords[TCTOOLPOS_TO_VECTPOS(TLWordPos)].alignments.size(); p++) {      
	    SLWordPos = TLWords[TCTOOLPOS_TO_VECTPOS(TLWordPos)].alignments[p];
	    // need to debug! This gives me a position higher than the one that should be giving me...
	    // 5 - go, instead of 4 - to
	    SLWordPos--;
	    SLWord = SLWords[TCTOOLPOS_TO_VECTPOS(SLWordPos)].value;
	    vsSLside.push_back(SLWord);
	    cout << "SLWordPos is "  << SLWordPos << " and SLWord is " << SLWord << endl;

RR.out.10-10-06:
...
Action type = delete
Wi: [que]
i is 3
SLWordPos is 4 and SLWord is to

---
	      vector<string> vsEmptySLside;
	      vsEmptyTLside.push_back("");
	      pNewLexEntry->SetTLLexicon(vsEmptyTLside);

Got Lexical entry for "to" and "que"
{CONJ,2}
CONJ::CONJ |: ["to"] -> ["que"]
(
  (X1::Y1)
  ((x0 form) = to)
)
Since pLexEntry exists in the Lexicon... 
{CONJ,3}
CONJ::CONJ |: ["to"] -> [""]
(
;(P:{CONJ,2})
  (X1::Y1)
  ((x0 form) = to)
)

Simplest DELETE case implemented:

****************************************************************************
***The refined grammar and lexicon produced the user corrected translation***
		The correct translation is: ME GUSTAR�A IR
****************************************************************************

****************************************************************************
***However it is still producing the incorrect translation, previously corrected***
		by the user: ME GUSTAR�A QUE IR
****************************************************************************


******************************
******************************
Moving to next version: v0.10
******************************
******************************
so that all the changes to reduce ambiguity don't interferre with the evals
being done with V0.09 for NAACL 06

- backed up the relevant files to afs and temuco
temuco: cd /usr4/aria/RuleRefinement
[aria@temuco RuleRefinement]$ cp /avenue/usr0/aria/RuleRefinement/info/ChangeLog.txt info
cp: overwrite `info/ChangeLog.txt'? y
[aria@temuco RuleRefinement]$ cp -Ru /avenue/usr0/aria/RuleRefinement/V0.09/* V0.09

- for some reason the RR is now processing 9 before all the other files...
and this is problematic, since the refinement necessary to correct 9 adds
a lot of ambiguity...
If problem persists, need to move 9 out and process the rest 1st, then process
9 on the refined grammars.

- Doing step-wise eval:
See /usr0/aria/eng2spa/corpus/DiagnosticTests/00-Eval-StepWise4NAACL-10-10-06


Thursday, October 12, 2006

- Adding constraints to the New Rule with the MoveConstit  

Doing this on V0.10...

Action type = cwo
Wi: [gran]
i: 5
i': 4
Wmoved: [artista]
Word has been moved this many postitions (i-i')= 1
SLWordPos is 4 and SLWord is great
Got Lexical entry for "great" and "gran"
{ADJ,4}
ADJ::ADJ |: ["great"] -> ["gran"]
(
  (X1::Y1)
  ((x0 form) = great)
  ((y0 agr num) = sg)
)
Refining lexical entry extracted

Postulating New Feature: feat_1
New value constraint created is (y0 feat_1) = +
Added to refined lexical entry...
{ADJ,49}
ADJ::ADJ |: ["great"] -> ["gran"]
(
;(P:{ADJ,4})
  (X1::Y1)
  ((x0 form) = great)
  ((y0 agr num) = sg)
  ((y0 feat_1) = +)
)
Affected Rule are: 1
{NP,8} [gran]
CWO: Making sure it's an active rule in the R Hierarchy
got active rule
{NP,9}
NP::NP : [DET ADJ N] -> [DET N ADJ]
(
;(P:{NP,8})
  (X1::Y1)  (X2::Y3)  (X3::Y2)
  ((x0 det) = x1)
  ((x0 mod) = x2)
  (x0 = x3)
  (y0 = x0)
  (y1 == (y0 det))
  (y3 == (y0 mod))
  (y2 = y0)
  ((y1 agr) = (x1 agr))
  ((y1 agr num) = (y2 agr num))
  ((y1 agr gen) = (y2 agr gen))
  ((y3 agr gen) = (y2 agr gen))
)
Bifrucated {NP,8}iPOSPos (MoveFromPos) is: 3
MoveToPos is: 2
Constituent in 3 has been moved to 2 in the refined rule:
{NP,12}
NP::NP : [DET ADJ N] -> [DET ADJ N]
(
;(P:{NP,9})
  (X1::Y1)  (X2::Y2)  (X3::Y3)
  ((x0 det) = x1)
  ((x0 mod) = x2)
  (x0 = x3)
  (y0 = x0)
  (y1 == (y0 det))
  (y2 == (y0 mod))
  (y3 = y0)
  ((y1 agr) = (x1 agr))
  ((y1 agr num) = (y3 agr num))
  ((y1 agr gen) = (y3 agr gen))
  ((y2 agr gen) = (y3 agr gen))
)
Adding value constraint (=c) to the bifurcated rule...
Bifurcated {NP,8}
Need to create Value Constraint with feat_1 and add constraint to rule
New value constraint created is (y2 feat_1) =c +
Added to the rule...
{NP,13}
NP::NP : [DET ADJ N] -> [DET ADJ N]
(
;(P:{NP,12})
  (X1::Y1)  (X2::Y2)  (X3::Y3)
  ((x0 det) = x1)
  ((x0 mod) = x2)
  (x0 = x3)
  (y0 = x0)
  (y1 == (y0 det))
  (y2 == (y0 mod))
  (y3 = y0)
  ((y1 agr) = (x1 agr))
  ((y1 agr num) = (y3 agr num))
  ((y1 agr gen) = (y3 agr gen))
  ((y2 agr gen) = (y3 agr gen))
  ((y2 feat_1) =c +)
)

Added new rule to grammar...
Adding a Blocking constraint (=-) to the original rule...
Bifurcated {NP,8}
Creating a blocking constraint with feat_1 and adding constraint to rule
New value constraint created is (y3 feat_1) = -
Added to the rule...
{NP,14}
NP::NP : [DET ADJ N] -> [DET N ADJ]
(
;(P:{NP,9})
  (X1::Y1)  (X2::Y3)  (X3::Y2)
  ((x0 det) = x1)
  ((x0 mod) = x2)
  (x0 = x3)
  (y0 = x0)
  (y1 == (y0 det))
  (y3 == (y0 mod))
  (y2 = y0)
  ((y1 agr) = (x1 agr))
  ((y1 agr num) = (y2 agr num))
  ((y1 agr gen) = (y2 agr gen))
  ((y3 agr gen) = (y2 agr gen))
  ((y3 feat_1) = -)
)

Added new rule to grammar...
Disabling original rule
Deleting all loaded rules.
Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon-REFINED-8.trf with 250 rules added
Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-REFINED-8.trf with 21 rules added
Checking refined lattice for the presence of CTL and TL sentences
XferEngine::CheckRefinedLattice: Checking if CTL and TL sentences are in the Refined Lattice...
TL sentence is [GAUD� ERA UN ARTISTA GRAN]
CTL sentence is [GAUD� ERA UN GRAN ARTISTA]
these are the alterative translations for Gaudi was a great artist :
tl-0: GAUD� ERA UN GRAN ARTISTA
tree-0: ((S,1 (NP,2 (N,7:1 'GAUD�') ) (VP,6 (VP,7 (VB,2 (AUX,1:2 'ERA') ) ) (NP,13 (DET,3:3 'UN') (ADJ,49:4 'GRAN') (N,8:5 'ARTISTA') ) ) ) )
tl-1: GAUD� ERA UNA GRAN ARTISTA
tree-1: ((S,1 (NP,2 (N,7:1 'GAUD�') ) (VP,6 (VP,7 (VB,2 (AUX,1:2 'ERA') ) ) (NP,13 (DET,31:3 'UNA') (ADJ,49:4 'GRAN') (N,8:5 'ARTISTA') ) ) ) )
tl-2: GAUD� ERA UN ARTISTA GRANDE
tree-2: ((S,1 (NP,2 (N,7:1 'GAUD�') ) (VP,6 (VP,7 (VB,2 (AUX,1:2 'ERA') ) ) (NP,14 (DET,3:3 'UN') (N,8:5 'ARTISTA') (ADJ,3:4 'GRANDE') ) ) ) )
tl-3: GAUD� ERA UNA ARTISTA GRANDE
tree-3: ((S,1 (NP,2 (N,7:1 'GAUD�') ) (VP,6 (VP,7 (VB,2 (AUX,1:2 'ERA') ) ) (NP,14 (DET,31:3 'UNA') (N,8:5 'ARTISTA') (ADJ,3:4 'GRANDE') ) ) ) )
tl-4: GAUD� ESTABA UN GRAN ARTISTA
tree-4: ((S,1 (NP,2 (N,7:1 'GAUD�') ) (VP,6 (VP,7 (VB,2 (AUX,2:2 'ESTABA') ) ) (NP,13 (DET,3:3 'UN') (ADJ,49:4 'GRAN') (N,8:5 'ARTISTA') ) ) ) )
tl-5: GAUD� ESTABA UNA GRAN ARTISTA
tree-5: ((S,1 (NP,2 (N,7:1 'GAUD�') ) (VP,6 (VP,7 (VB,2 (AUX,2:2 'ESTABA') ) ) (NP,13 (DET,31:3 'UNA') (ADJ,49:4 'GRAN') (N,8:5 'ARTISTA') ) ) ) )
tl-6: GAUD� ESTABA UN ARTISTA GRANDE
tree-6: ((S,1 (NP,2 (N,7:1 'GAUD�') ) (VP,6 (VP,7 (VB,2 (AUX,2:2 'ESTABA') ) ) (NP,14 (DET,3:3 'UN') (N,8:5 'ARTISTA') (ADJ,3:4 'GRANDE') ) ) ) )
tl-7: GAUD� ESTABA UNA ARTISTA GRANDE
tree-7: ((S,1 (NP,2 (N,7:1 'GAUD�') ) (VP,6 (VP,7 (VB,2 (AUX,2:2 'ESTABA') ) ) (NP,14 (DET,31:3 'UNA') (N,8:5 'ARTISTA') (ADJ,3:4 'GRANDE') ) ) ) )

****************************************************************************
***The refined grammar and lexicon produced the user corrected translation***
		The correct translation is: GAUD� ERA UN GRAN ARTISTA
****************************************************************************

****************************************************************************
***And what is more, the refined MT system did NOT produce the incorrect translation***
		detected by the user previosuly: GAUD� ERA UN ARTISTA GRAN
****************************************************************************

done ;-)

- running Xfer with new G for 9 and 7...


Monday, October 16, 2006

- trying to get the RR to refine a slightly modified verion of grammar3.trf and
lexicon3.trf (init-test.trf)

updated links to L and G in RuleRefinement.cpp

[aria@avenue V0.10]$ ./RuleRefinement > ! RR.out.10-16-06

- first size and memory were growing to more than 550M (and aborted after 3 min), 
so changed the lexicon so that all the entries had "" as in the 
simulation-lexicon. Even changed the L and G file names, to make sure that wasn't
the problem.
That took care of the size problem, but it's still running after 30 min: 
  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
  9590 aria      16   0  1340 1340  1236 R    99.7  0.0  25:00   0 RuleRefinemen 9590 aria      17   0  1340 1340  1236 R    99.9  0.0  29:16   0 RuleRefinemen
 9590 aria      17   0  1340 1340  1236 R    99.9  0.0  30:02   1 RuleRefinemen
9590 aria      19   0  1340 1340  1236 R    99.7  0.0  32:42   1 RuleRefinemen
killed it...

- changed G and L file names, just in case:
   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
***Aborted just after 3 min and more thatn 500M***

9879 aria      19   0  436M 436M  1300 R    99.9 21.5   2:47   1 RuleRefinemen
9879 aria      14   0  327M 327M  1300 R    99.9 16.2   2:07   0 RuleRefinemen
9879 aria      14   0  301M 301M  1300 R    99.9 14.9   1:57   0 RuleRefinemen
9879 aria      19   0  225M 225M  1300 R    99.7 11.1   1:27   0 RuleRefinemen
9879 aria      14   0  173M 173M  1300 R    99.9  8.5   1:06   0 RuleRefinemen
9879 aria      19   0 83408  81M  1300 R    99.9  4.0   0:31   0 RuleRefinemen

RR.out.10-16-06:
Compiled on Oct 16 2006 16:10:10 with g++ version: 3.2.2 20030222 (Red Hat Linux
 3.2.2-5) in debug mode
Parameters are:
Debug Level     = 2
Lexicon File    = /usr0/aria/eng2spa/lexicons/lexicon-TestEC.trf
Grammar File    = /usr0/aria/eng2spa/grammars/grammar-TestEC.trf

Looking at an old RR.out, realized I also need to update the the auto-init file
used by the Xfer class...
Compiled on Oct 12 2006 14:19:05 with g++ version: 3.2.2 20030222 (Red Hat Linux
 3.2.2-5) in debug mode
Parameters are:
Debug Level     = 2
Lexicon File    = /usr0/aria/eng2spa/lexicons/simulation-lexicon.trf
Grammar File    = /usr0/aria/eng2spa/grammars/simulation-grammar.trf
***************************************************************
StartXfer::initfile is /usr0/aria/eng2spa/auto-init.txt
Turning on Latin-1 mode
Setting normalizecase to UPPER-CASE
Setting find all translations to ON
Setting output source text to ON
Setting showtrace to full trace with src indices
Loading rule file /usr0/aria/eng2spa/grammars/simulation-grammar-ID.trf with 19 
rules added
Loading rule file /usr0/aria/eng2spa/lexicons/simulation-lexicon-ID.trf with 247
 rules added
MAIN::Adding from directory : /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08-
25-11387
File : /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08-25-11387/0
 
But auto-init.txt contains to hard link to any G or L file, so there is no need to update that.

These files load just fine to the Xfer engine when doing it on the command line,
so it has to be a problem with the RR code... 

- deleted as many comments as possible from grammar and lexicon, just to make sure

- Aborted after 3:16 minutes with 511M 

commented out "'s" and "can't", just in case -> same behaviour

Xfer trace indicated that there are 458 lexical entries in 
/usr0/aria/eng2spa/lexicons/lexicon-TestEC.trf with 458 
and 41 rules in /usr0/aria/eng2spa/grammars/grammar-TestEC.trf 

however when i grep for rule identifier symbols, I get several more:
grep "|:" | wc-l -> 468
     "\->"	 -> 470
     "x0 from"	 -> 468

so there must be 10 lex entries commented out, I could only find 9, but I am 
sure I missed one...


Tuesday, October 17, 2006

- Bill stopped by and figured out that the problem is that his readin rule
function expects either the end of file or another rule, and since there were
some blank lines at the end of the new G and L, it would not exit the loop, 
and so it would run out of memory and abort.
Now it's complaining about something having an empty tree, good sign!
-> bug in the lexicon
2
- had to update path for -ID and -REFINED both in RuleRefinement.cpp and 
XferEngine.cpp


Monday, October 23, 2006

- Debugging G and L for Diagnostic Test set examples:
there was a bug in the grammar (VP,5), where instead of tense = inf, I had
type = inf, which was conflicting with type = refl for marcharte and convertirme. 
Erik pointed it out to me.
Fixed grammar.

-re-run RR so that refined grammar reflects change, but I have the expanded
G and L loaded in in order to get results for the EC test set, so...

cp RuleRefinement RuleRefinement-TestEC
cp RuleRefinement.cpp RuleRefinement-TestEC.cpp

editing the RuleRefinement.cpp file so that it has the Diagnostic Test set
G and Ls

Since Erik updated the Xfer engine, when I recompile it, it gives me tones
of error messages, emailed Erik

Updated Makefile -> Using local copy for now :)


Friday, November 3, 2006

- met with Bill: most specific rule + METIS workshop
Picking the most specific rule is implemented now (Parse Tree contains range
info, so that it knows where it can add a word and were it can't, look for
the highest node where the word can still be inserted between daugther constits:
limitations: when two words are added next to each other, the current 
implementation won't work (since parse tree doesn't get updated after each 
correction right now)

When a word is added at the beginning or at the end of a sentence, it get's 
added to the mother node.

- send reply to METIS person (Final paper deadline: Dec. 1)
Trying to see if Bill can also come.


Tuesday, November 13, 2006

*****************************************************************************
 V0.10 is working for 7 refinements and VP is now being refined with "a"
 so that spurious subjects and obliques are not being generated.

 Haven't done any formal evaluation with this final implementation in place.
*****************************************************************************

- moved to V0.11, since I want to add new refinements for subj-verb agreement

Wednesday, November 15, 2006

- tried running the RR on a different dir (dirs.txt)
 /usr0/aria/RuleRefinement/IOFiles/2006-3-31-17-35-31-23763

[aria@avenue V0.11]$ ./RuleRefinement >! RR.out.11-15-06
Segmentation fault

r[aria@avenue 2006-3-31-17-35-31-23763]$ rm user-info all
[aria@avenue 2006-3-31-17-35-31-23763]$ rm 6

7:
TL sentence is [WOULD LIKE QUE IR]
these are the alterative translations and their parses for I would like to go:
tl-0: ME GUSTAR�A QUE IR

for some reason the Xfer doesn't output "me gustaria que ir"!
Even though lexicon-test.trf has both "ir" and "me gustaria"

[aria@avenue 2006-3-31-17-35-31-23763]$ mv 7 ../7-of-2006-3-31-17-35-31-23763

Now, when processing 8:
...
TL is one of the alternatives: YO VISTE T�

MAIN::Printing Tree extracted from Xfer engine for SL-TL pair in the logfile:
((S,1 
(NP,1 
(PRON,1:1 'YO'))
(VP,3 
(VB,1 
(V,8:2 'VISTE'))
(NP,1 
(PRON,4:3 'T�'))))

MAIN::Instantiating correction instance from TCTool Log File

Segmentation fault

RuleRefinement.cpp:

	  /// Instantiating CI from TCTool Log File
	  if (DebugLevel >= 1) cout << "\n\nMAIN::Instantiating correction instance from TCTool Log File\n";

	  // When loading single file 
	  //	  pCI->LoadTCToolLogFile(pLogFile, &tree);
	  pCI->LoadTCToolLogFile(file.c_str(), &tree);

	  cout << "MAIN::CI instantiated\nCTLS is: " << pCI->GetCTLSentence() << endl << endl;

leaving aside for now, since I want to focus in getting new refinemenets on:

-> look at current sentences see if any has subj-verb agreement problems
and try using that first to test and debug the code.

-> run on more Log Files (add as higher numbers at the end)
subj-verb agreement (might involve percolate, check)
Adj-N number agreement
Det-Adj agreement (gender)
Subj Compl. agreement (cop verb) - gender and number
I gave the boy a book (will need to refine V NP "a" NP rule)

transfer -if init.txt

init.txt:
loadrules /usr0/aria/eng2spa/lexicons/simulation-lexicon.trf
loadrules /usr0/aria/eng2spa/grammars/simulation-grammar.trf

transfile /usr0/aria/eng2spa/corpus/more-examples-tct
; Subj-v agreement (both gender and number)
I sleep 
; a in front of VP -> V NP "a" NP (!= a, != refinement, indirect object)
I gave the boy a book 
; adj-n number agreement
I meet some tall girls
; Det-Adj agreement (gender), for cases when the noun is underspecified
I love a secret agent
; Subj-Compl agreemnt (gender and number) - copulative verbs
the girl is tall
the boys are tall
; a in front of VP -> V NP "a" NP (!= a, != refinement, indirect object)
I gave the boy a book 

../bin/postprocess-xfer.out.debug.pl < more-examples-tct.out.debug > input-tct-more

/usr1/depot/apache/httpd/htdocs/aria/spanish:
[aria@avenue spanish]$ mv input-tct input-tct-testing
[aria@avenue spanish]$ mv input-tct-more input-tct

Corrected the sentences with the TCTool (and took snapshots -> saved in HLT 07
folder)

cp -r out-test/2006-11-15-17-32-40-6618 /usr0/aria/RuleRefinement/IOFiles/

edited dirs.txt and runned it...

[aria@avenue V0.11]$ ./RuleRefinement >! RR.out.MoreExamples.11-15-06
No parse found.
LEX 0 GRA 0 UNK 0 MORPH 0 COMP 0
Segmentation fault

-> need to do it with the basic G and L first, change RR.cpp again

-> look at how to end the loop (of processing Log Files) sooner. 
Right now there are 4 "before action" print outs after the last log file (7) is
processed 


November 19, 2006

- realized V0.10 is empty!!!
must have moved to 11 instead of copying it over :(

- cp files in V0.11 back to V0.10 and changed dirs...
[aria@avenue V0.10]$ mv dirs-working.txt dirs.txt

seems to be working, phew!
backed it up to temuco and Avenue!!!


NEED TO TEST (involves TCTool work to generate new log files)
- test dir traverse, is it robust?

!!  - clue word info stored properly? ("se" -> cayeron) [need to generate log file]

also, have the gaudi sentence with two errors, so that I can test sentences
with more than just one error :-)


Need to:

-> change affected rules heuristic to first trying out the rule which contains
more context
ex: "a"  -> VP "a" NP instead of "a" NP -> subject!!!


TO DO--------------------

-> add print statement saying that an alignment has been added, so that 
I know it's not a bug!

-> need to make sure that at the end of the lexical refinements for
add and edit, I also output a "done comment"

-> need to detect when user did NOT refine anything, and not store that log 
file in the correction, right?
Look at the instructions I had given to Bill about CICollection

MAIN::CI's CTL sentence is instantiated with [ellos ven agua ]
XferEngine::TLInLattice: Checking if TL sentence is in the Lattice...
TL sentence is [ELLOS VEN AGUA]

****************************************************************************
***This translation: ELLOS VEN AGUA is being generated by the current system.
**************************************************************************
MAIN::pXfer->TLInLattice: yes the CTL sentence is in the lattice

MAIN::However, let's see if the RR module can make the grammar tighter, by not generating
 the incorrect translation (TL) moving on to refining it...
****************************************************************************


**************************************************
After all actions in CI:
SLWords: they see water 
TempCTLWords (1st time = TLWords): ellos ven agua 
Alignments: ((1,1),(2,2),(3,3))
**************************************************


-> try with different TCTool directory
 2006-3-31-17-35-31-23763

see if it also crashes after the 4th refinement


-> I will need to test for all possible order combinations to find all the bugs...
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Once it's working, look at ways of modularizing it more. 
Should not have to load the grammar and lexicon and check the Xfer lattice
for each switch case, but just once after all of them.


l. 1236
	////////////////////////////////////////////
	// Printing the lexicon to a file (even though it might not have been changed)
	// Add a flag so that if the lexicon has not been refined, the old file is used instead
	// this would work as a natural bookkeeping, but could also get confusing, knowing which 
	// log files caused refinements and which didn't... need to implement a better bookkeeping
	// strategy
	/////////////////////////////////


-> proceed working from here...

- implement precision in lattice score

testing cwo case (sentences 9, 8, although too complicated for now):
./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-3-31-17-35-31-23763/9

but CI instantiation is seg faulting... :(

- finish debugging and implementing: RuleInstantiation method for
  1. add a constit (literal, and then POS) (sentence 5)
  (when adding a word to a GraRule, the POS of the following word should be skipped (fLookatLeafPOS=false), since the method needs to retrieve the parent node
of the next word (Leaf).)

- implement delete case
check if I can replace the SLside (add "to" to  "would like" for example, for the delete case)

- percolate 

l. 1302
- still need to test: replace featname in a lexical entry
./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08-25-11387/0 >! 0RR.out
- for now just test it with a made up example
- bool LexicalEntry::ReplaceFeatName(sFeatName1, sFeatName2);
l. 1250

- test CI Collection

- think about ReverseRefinements

---------------------------------------------------------
need to debug and finish perl script to massage MM lexicon
- Word2Lexicon l.298 
weird character problem (emailed Erik in May, remind him about this)

--------------------------------------------------------------

- move pretinent methods to Refiner/Utils and RRRule
delta function
constraint addition

- get lines of code (esitmate) to have an idea (Bill's classes + my code)

-> add Bill's classes to Makefile, once everything is working, but back up
working Makefile first!

- ask bill about the tr pairs annotation, and rule origin (CI/user) annotation


I need to look into it, and then figure out what exaclty needs to be implemented,
and will email Bill
- Reverse Refinement(s)
I havent had time to look into this, but it would be great if we could
look at the time stamp management before you leave, so that if rule does
not result into an improvement on a test set (T2), there is a good way to
reverse to the previous version of the grammar (T1). As I told you before
(and maybe its already implemented), it would be useful to have a
variable that expresses whether a rule lead to improvement or not (bool
Rule.ImprovedAccuracy()).
Maybe this is too complex to do before you leave, but maybe you have
already implemented most of what would be needed, and it would be fairly
simple. In any case, Id like to know.


pending:
8 ./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-2-13-17-13-55-8336/8 >! 8RR.out

Complex: 2 actions: edit + cwo
  te is created as a copy of tu -> feat_0 is postulated
  (should really be case, but RR cannot know that)

->   but feat_0 doesn't get added to the lexical entries


- Figure out why the program seg faults after it finishes...

./RuleRefinement.exe -a /usr0/aria/RuleRefinement/IOFiles/2006-5-15-13-08-25-11387/1 >! 11387-1RR.out

New agreement constraint created is (y1 agr pers) = (y2 agr pers)
Added to the rule...
{S,91}
S::S : [NP VP] -> [NP VP]
(
;(P:{S,1})
  (X1::Y1)  (X2::Y2)
  (x0 = x2)
  ((y1 case) = nom)
  ((y1 agr) = (x1 agr))
  ((y2 tense) = (x2 tense))
  ((y1 agr pers) = (y2 agr pers))
)

****************************************************************************
***The refined grammar and lexicon produced the user corrected translation***
		The correct translation is: ELLA LEYÓ
****************************************************************************

****************************************************************************
***However it is still producing the incorrect translation, previously corrected***
		by the user: ELLA LEÍ
****************************************************************************

it seg faults!!! -> debug

-> need to figure out why the constraint does not prevent "ella lei" from generating


Bill to do:
*****************************************************************************
Bill will be working on Spurious Loop and Error complexity and finish at least
one of the two tasks a couple of weeks from now.

- finish detect spurious loop
- look at Error Complexity implementation -> paper
- fix any remaining bugs in Add Constituent to RHS, CICollection, etc.
- add "" to literal constits
- update indices in constraints for AddConstitToRHS
- getPOS should actually ouput a POS and not a position...
- percolate method (new)
- enhance delta function: if there is no other difference (no different value for the same attribute name), but there is a differing attribute, output that.

Ultimately: error complexity score implementation (polynomial sort, reverse lexicographic (decendent) order, see paper), for now, since I'll just deal with a couple of examples, have independent errors rank higher, and dependent errors
lower.
*****************************************************************************

4Bill:

keep track of refinement status: proposed, confirmed1 (by exact match),
confirmed2 (by increasing automatic MT metrics over a regression test)

When trying to run example sentences 8 and 9 I get a seg fault:
MAIN::Instantiating correction instance from TCTool Log File
Segmentation fault

Test: CICollection, Ranking and error complexity

Ari CI testing pending:

- CON of current TCTool implementation:
it doesn't reflect what word was dragged when is a switch between
contiguous words, it just says a word has been moved, and shows final order, 
so there is no way to deduce what was the word the user actualy moved.
Does it matter? It wouldn't matter, unless the user also edited one of these
two words. Currently, my frame assumes that some words need to be edited and
moved as being part of the same error (Wi is both the word that was edited and
the word that was moved)... since there is a causal relationship between 
those two cases, often it needs to be moved becuase it has a different form.

- test log file 9 with no header and have my code pass the load method 
the parse trace and test to make sure it doesn't break.
-> carefully test to see if alignments are correctly parsed and extracted!!!

- when there is a clear alignment action followed by a delete word -> 
just take into consideration the delete word.

Make a note somewhere were I'll remember to look at...
- ignore alignments added from English subjects to Spanish verbs, no action needed

-> find complex examples of alignments that are produced by the system, so 
that I can test Bill's code more thouroughly

-> test CI on more complicated instances from user studies 


PENDING:

-> need to test lexical case 2 with examples 4 and 7
4: John and Mary fell -> * Juan y María cayeron -> Juan y María se cayeron
7: I would like to go -> * me gustaría que ir -> me gustaría ir

Jaime: in parallel to coding, start thinking of other examples that are
structurally identical to the ones I have implemented, but so that I can
say my methods are general. For each case, have about 10 examples that are
supposed to exercise it. In the case of a rule refinement, have other examples
that only have the problem the refinement is addressing and test it


!!!!!!!!!!!!!!!!
-> always back up all the code and data that cannot be regenerated to the
Avenue afs directory