Syntax Augmented Machine Translation via Chart Parsing

Latest version: HERE
This is a Hadoop-based MapReduce-parallelized version (see our IWSLT'08 paper). Check file setup-commands.txt for installation instruction. The readme file still refers to the old non-Hadoop SAMT version. Instead, read the "Grammar based statistical MT on Hadoop" paper below for usage instructions.


Our open-source SAMT system consists of three parts:
  1. Extraction of statistical translation rules from a training corpus; either plain hierarchical rules a la Chiang (2005) or syntax-augmented rules a la Zollmann&Venugopal (2006).
  2. CKY+ (Chappelier and Rajman, 1998) style chart-parser employing the statistical translation rules to translate test sentences
  3. A minimum-error-rate optimization and scoring tool (integrated into the chart parser) to tune the parameters of the underlying log-linear model on a held-out development corpus
The system is available open-source under the GNU General Public License. Click here to download it. (Library LGPL version [needed if used for commercial purposes, no support provided]: here.)

Documentation for the SAMT is available by consulting the following sources.

We will regularly updating the SAMT system. We have created the following Google groups to manage announcements, and host technical discussions regarding the system.

Of course, you also can email us directly: {zollmann or ashishv} (at)

HOME, yeah! InterACT homepage

HOME, yeah! Andreas's homepage

HOME, yeah! Ashish's homepage

Locations of visitors to this page