MER versions

This pages tracks version of the MER training script (thanks in part due to helpful suggestions by users), with some details regarding relevant differences.

optimizeV5IBMBLEU.m, optimizeV5NIST.m

  • Version 4 fixed a bug that limited the number of distinct error regions that were considered to 10
  • Version 5 using more sensible parameter for NumberOfRandomTests and ConvergedLimit to avoid overfitting

optimizeV3IBMBLEU.m
  • Based on V2
  • Goes after each lambda based on how much gain is available on that parameter
  • Lambdas with most score gain are changed first
  • Once it converges (its more greedy now), jump upto JUMP_PERC of the param range, and try again, do this upto ConvergedLimit times
  • I have found this script to give more stable param values over multiple iterations, and equal or higher final values

optimizeV3NIST.m
  • NIST version of above
  • Takes in info gains instead of correct counts. Make sure length of ref is set correctly

optimizeV2IBMBLEU.m
  • Allows you to specify the best working parameters uptil this point in the INIT param
  • NumRandomTests is the number of times random seeds are initialized for the search (I set this to 3 using 1K best lists)
  • PermutationsEpsilon indicates how close two score can be before they are considered the same
  • ConvergedLimit: if the score diff is less that PermutationsEpsilon for ConvergedLimit steps thru lambda, end iteration for this random seed run

optimizeV2NIST.m
  • NIST version of V2 script. Instead of corrrect/suggest counts pass in the info gains for the correct field