Language Technologies Institute
Student Research Symposium 2006

Delayed LM Intersection and Left-to-Right N-Best Extraction for Syntax-Based MT

Ashish Venugopal with Andreas Zollmann

We begin by describing a set of pruning constraints that are applied in the literature to effectively restrict the search space of synchronous PCFGs intersected with target language model contexts. We apply these constraints to non-binarized grammars with a large number of non-terminals and demonstrate effective parsing within the framework of Wu, 97.

We then present a novel parsing approach that avoids language model context intersection during parsing in favor of language model driven n-best list extraction. The parsing step produces a sentence spanning parse forest which is explored in left-to-right target order by the N-Best extraction method.

This method avoids lossy pruning during the parsing process, searching a much larger effective parse space than practically possible in the full intersection scenario, and has the important benefit of allowing integration of a high order language within the N-Best search process, rather than only in parse re-scoring.

We demonstrate the impact of this parsing approach using the SPCFG approach described in Zollmann, Venugopal, Vogel 06, which is similar to Galley et al., 04 and compare performance against full intersection.