11-734: Advanced Machine Translation Seminar

Spring 2008


Course Description

The Advanced Machine Translation Seminar is a graduate-level seminar on current research topics in Machine Translation. The seminar will focus this year on topics related to state-of-the-art data-driven approaches to Machine Translation (including various "flavors" of Statistical MT, Example-based MT, Syntax-based approaches and more). Related problems that are common to many of the various approaches will also be discussed, including the acquisition and construction of language resources for MT (translation lexicons, language models, etc.), methods for automatic word, phrase and structure alignment of sentence-parallel data, etc. The material covered will be mostly drawn from recent conference and journal publications on the topics of interest and will vary from year to year. The course will be run in a seminar format, where the students prepare presentations of selected research papers and lead in class discussion about the presented papers.

Prerequisites & corequisites:


General Information

Class Meeting Time and Location:
Wednesday, 1:30PM - 2:50PM, Location: NSH 4513

 
Primary Instructor:
Alon Lavie, alavie@cs.cmu.edu, NSH 4615, 268-5655, Office Hours: By Appointment

 
Co-Instructor:
Stephan Vogel, vogel+@cs.cmu.edu, InterACT Lab, 268-4526, Office Hours: By Appointment


Schedule and Readings

Date Topic Presenter Readings Comments
Jan 16
Course Information Alon Lavie and Stephan Vogel
Presentation Slides
Jan 23
Overview: Word Alignment Models and Phrase Extraction Methods Stephan Vogel
Jan 30
Overview: Statistical MT Decoding Methods Stephan Vogel
Feb 6
Morphology and its Integration within MT Eric Davis (1) Mathias Creutz and Krista Lagus (2006). Morfessor in the Morpho Challenge, In Proceedings of the PASCAL Challenge Workshop on Unsupervised Segmentation of words into Morphemes, Venice, Italy, April.
(2) Andreas Zollmann and Ashish Venugopal and Stephan Vogel (2006). Bridging the Inflection Morphology Gap for Arabic Statistical Machine Translation, In Proceedings of HLT-NAACL 2006, Short Paper, New York City, NY.
Presentation Slides
Feb 13
Segmentation Issues in MT Linh Nguyen (1) J. Xu, R. Zens and H. Ney (2004). Do We Need Chinese Word Segmentation for Statistical Machine Translation? In Proceedings of the Third SIGHAN Workshop on Chinese Language Learning, pp. 122-128, Barcelona, Spain, July 2004.
(2) J. Xu, E. Matusov, R. Zens and H. Ney (2005). Integrated Chinese Word Segmentation in Statistical Machine Translation. In Proceedings of the International Workshop on Spoken Language Translation (IWSLT), pp. 141-147, Pittsburgh, PA, October 2005.
Presentation Slides
Feb 20
NO CLASS
Feb 27
Incorporating Context in MT Aaron Phillips (1) Marine Carpuat and Dekai Wu (2007). Context-Dependent Phrasal Translation Lexicons for Statistical Machine Translation In Proceedings of Machine Translation Summit XI. Copenhagen: Sep 2007.
(2) Jesús Giménez and Lluís Márquez (2007). Context-aware Discriminative Phrase Selection for Statistical Machine Translation In Proceedings of WMT 2007 Workshop at ACL-07, Prague, June 2007.
(3) S. Hildebrand , M. Eck , S. Vogel and A. Waibel (2005). Adaptation of the Translation Model for Statistical Machine Translation based on Information Retrieval In Proceedings of the Meeting of the European Association for Machine Translation (EAMT).
Presentation Slides
Mar 5
Incorporating Linguistic Information in Machine Translation Evaluation Jason Adams (1) Karolina Owczarzak, Josef van Genabith, and Andy Way (2007). Labelled Dependencies in Machine Translation Evaluation. In Proceedings of the Second Workshop on Statistical Machine Translation (WMT-07) at ACL-07. Prague, Czech Republic, June 2007.
(2) Joshua S. Albrecht and Rebecca Hwa (2007). A Re-examination of Machine Learning Approaches for Sentence-Level MT Evaluation. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. Prague, Czech Republic, June 2007.
Presentation Slides
Mar 12
NO CLASS (Spring Break)
Mar 19
Reordering in Phrase-based SMT Alok Parlikar (1) TP Nguyen, A Shimazu (2006). A Syntactic Transformation Model for Statistical Machine Translation. In Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. LNCS Volume 4285, Springer. Pages 63-74.
(2) CH Li, M Li, D Zhang, M Li, M Zhou, Y Guan (2007). A Probabilistic Approach to Syntax-based Reordering for SMT. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. Prague, Czech Republic, June 2007.
Presentation Slides
Mar 26
Syntax-based Models for SMT Amr Ahmed (1) K. Yamada and K. Knight (2001). A Syntax-Based Statistical Translation Model. In Proceedings of the 39th Annual Meeting of the Association of Computational Linguistics. June 2001.
(2) D. Chiang (2005). A Hierarchical Phrase-based Model for Statistical Machine Translation. In Proceedings of the 43th Annual Meeting of the Association of Computational Linguistics. Ann Arbor, MI, June 2005.
Presentation Slides
Apr 2
Dependency Structures in MT Vamshi Ambati (1) Yuan Ding and Martha Palmer (2005). Machine Translation Using Probabilistic Synchronous Dependency Insertion Grammars. In Proceedings of the 43rd Annual Meeting of the ACL. Ann Arbor, MI, June 2005. Pages 541-548.
(2) Chris Quirk, Arul Menezes and Colin Cherry (2005). Dependency Treelet Translation: Syntactically Informed Phrasal SMT. In Proceedings of the 43rd Annual Meeting of the ACL. Ann Arbor, MI, June 2005.
Presentation Slides
Apr 9
NO CLASS (GALE PI Meeting)
Apr 16
Factored and Syntax-based LMs Rashmi Gangadharaiah (1) Eugene Charniak, Kevin Knight and Kenji Yamada (2003). Syntax-based Language Models for Statistical Machine Translation. In Proceedings on MT Summit IX. New Orleans. 2003.
(2) Katrin Kirchhoff and Mei Yang (2005). Improved Language modeling for Statistical Machine Translation. In Proceedings of the ACL Workshop on Building and Using Parallel Texts, at the 43rd Annual Meeting of the ACL. Ann Arbor, MI, June 2005. pages 125-128.
Presentation Slides
Apr 23
Large-scale MT Architectures Qin Gao (1) Jeffrey Dean and Sanjay Ghemawat (2008). MapReduce: Simplified Data Processing on Large Clusters. In Communications of the ACM, vol. 51, no. 1 (2008), pp.107-113.
(2) Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och and Jeffrey Dean (2007). Large Language Models in Machine Translation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 858-867.
Presentation Slides
Apr 30
Syntax-constrained Word Alignment Greg Hanneman (1) Colin Cherry and Dekang Lin (2006). Soft Syntactic Constraints for Word Alignment through Discriminative Training. In Proceedings of the COLING/ACL 2006 Poster Session, pages 105-112.
(2) John DeNero and Dan Klein (2007). Tailoring Word Alignments to Syntactic Machine Translation. In Proceedings of ACL 2007, pages 17-24.
Presentation Slides
May 7
Discriminative Training Methods Abhaya Agarwal (1) Percy Liang, Alexandre Bouchard-Côté, Dan Klein, Ben Taskar (2006). An End-to-End Discriminative Approach to Machine Translation. In Proceedings of the COLING/ACL 2006.
(2) Benjamin Wellington, Joseph Turian, Chris Pike, and I. Dan Melamed (2006). Scalable Purely-Discriminative Training for Word and Tree Transducers. In Proceedings of AMTA 2006.