SAMT - A CKY+ synchronous parser for SMT

We present an open-source CKY+ synchronous parser for SMT as described in "Syntax Augmented Machine Translation via Chart Parsing" (Zollmann, Venugopal, 2006). Our goal is to futher research in SMT using annotated, generalized phrases as described in Chiang, 2005 and in our work. Here are some key points of note regarding our parser. You can find the software here and the paper is here

  • Open source C++, licensed under the GPL
  • Fast - translates the 2000 (realtest) sentences of the Europarl Fr-English data in approx 40 min , ie 46 sentences per minute, achieving state-of-the-art scores
  • Implements CKY+ for internal binarization during parsing
  • Can efficiently handles thousands of non-terminal categories
  • Performs LM intersection with the grammar at run-time, or optionally uses future cost estimates for LM cost, producing state-of-the-art scores
  • Performs LM intersection with the grammar at run-time, or optionally uses future cost estimates for LM cost, producing state-of-the-art scores