* February 25, 2003 Speed up ideas - Check to see if equation block is empty before preparing to run and running it - Do all of syntactic transfer/generation, then try all underlying lexical combinations, including inserted and chosen by FS Include - attempt filters (succeed once and remove) - fail always - rule watches/step through - adding in word based on unification constraints - translate just a single word with no syntactic part - now needs to consider possible multiple translations - (add) ability to parse with string (e.g. "word") in source rhs - move nice command line into TransferEngine and combine initfile and commandline argument processors - have clear (clear just temporary variables for a run) and reset (clear rules, etc.) Group all rules (syntactic and lexical) into parse groups. Each parsegroup contains - source production rule lhs and rhs - parse equation block (? or just use one of the rules' parse eb to avoid duplication) - rule type (PHRASE, LEX, COMPOUND) - vector of rules in block - What to do with unknown words? - have a special parsegroup for those or create it on the fly? - allow user to specify a set of "open class" POS (e.g. N, ADJ, V) an unknown could be unsigned these for parsing - pre-process text for unknown words, numbers, etc. before parsing (?) - Have separate method to process words and add in AGENDA - known words - unknown words that can be processed (dates, numbers) - unknown words findable after morphological processing - unknown words, add as open class POS Parsing proceeds much as before, except using parsegroups and not individual rules. (?) Back-off - Keep track of branch points - keep back up FS copies that can be restored for next try - Still use depth first search and a stack to guide search but instead of popping from stack when done with a constituent, keep it around, keep track of current active transfer constituent and only pop when back-tracking Lexical Transfer/Generation - run all possibilities first (before combinatory attempts), removing those that fail unification; don't use these in later combinatory attempts - Keep the resulting FSs from the above possibility checks, now just need to run (modified) FillTargetFS with different lexical combinations, copying over needed FSs - for words found by FS unification, find all matches first before running combinations, keep a list of matches and just use this during combinations - for inserted target words (ones included as a string in production) there is only one choice and possible FS - only create wordfss after successful fill/constraint Fill and Yn-Yn Contraint pass - Cache lower tree fs results? - probably, but only on one transfer (excluding lex) tree, then clear - also need to have same direct ancestors to be considered equal cache-wise? probably - recursive method should also pass ancestors - make tconstituents, tarcs class variables? Adding new rule - need to add to srclexicon and rulefinder after processing whole rule so we know what parse group it belongs to * May 2, 2003 - Fixed bugs in load from init file - Fixed bugs in parsing and reading in transfer grammar - able to read in Kathrin's grammar/lexicon and parse a sentence - TODO: need more intelligent comparison of parse equation blocks for parsegroups - TODO: back-tracking! Done! Chinese -> English Test Set (10 sentences) old : 1.62 seconds new : 0.54 seconds Tests for accuracy with 1. compounds 2. transfer 3. words inserted based on feature struct 4. words inserted as a string 5. features passed up target tree 6. Y-side agreement constraints 7. compositionality 8. Have source literals in parse rules All passed! Actually better than old xfer since some logic bugs fixed. English to German set (9 sentences) old : 321.92 seconds new : 9.59 seconds Yet to test: Helper functions 1. Rule fails 2. Rule watches 3. Attempt filters 4. Partial translations 5. Deleting rules Still to do: Parsing (left hooks for all these to add in later) 1. Put morphology back in (thought we'd be switching) 2. Deal with unknown words (add as one of open classes) Wish-list 1. Add server functionality to let it work with graphical interface June 26: - Put in ordering for constituents before lex entries in partial translations - Took out dupe checking June 27: - Took out constituent before lex order for partial translations June 28: - Added simple Unicode lowercase (only handles ASCII range) - Priority example (V unifies with N first) July 23, 2003: How to get better translations out sooner? How to improve best partial output? Aug. 26-27, 2003: Added trace output, so final target tree can be returned Fixed a bug where lexrule was getting set wrong, added separate tlexrule to fix problem