Stephan Vogel

I am Research Scientist in the

Language Technologies Institute, School of Computer Science, Carnegie Mellon University,

 

Office:

407 South Craig Avenue

Pittsburgh, PA 15213

Tel: 412-268-4526

Email: vogel at cs dot cmu another dot and finally edu ;-)

 

Before coming to CMU I worked from 1995-2000 as research assistant at the Lehrstuhl IV (Prof. Hermann Ney), Technical University (RWTH) of Aachen.  I also spend a couple of months at ISL (Prof. Alex Waibel) at the University of Karlsruhe.

In a previous academic life, I studied physics at the Philipps University Marburg, Germany, and got an MPhil in History and Philosophy of Science, from the University of Cambridge, England.


Research Interests:

My research interests focus on machine translation, i.e. developing techniques to automatically translate between different languages.  This includes all aspects of alignment models, word reordering models, language models, decoding algorithms, and system combination.  Applications include both text-to-text and speech-to-speech translation, and goes from MT on hand-held devices to very large scale data situations, where we use parallel processing under the Hadoop framework.


Research Projects:

Current:

  • INCA  (NSF): An Integrated Cluster Computing Architecture for Machine Translation
  • TransTac (DARPA):  Iraqi-English speech translation.
  • GALE (DARPA):  large-scale text and speech translation Arabic-English and Chinese-English.
  • MT4Refugees:  MT tool to support communication between NGOs and Refugees.
  • P2P (DARPA):  Peer production of data for NLP applications.

Old:

  • STR-Dust (NSF): Speech Translation for Domain Unlimited Spontaneous Communication Tasks.
  • STEEM: Summarization and Named Entities in Speech Translation.
  • TIDES (DARPA): RADD-MT Robust, Adaptable, Data-Driven Machine Translation.
  • Digital Olympics: Speech translation for the Olympic Games in Beijing 2008.
  • C-STAR: Consortium for Speech Translation Advanced Research.
  • Babylon (DARPA): Two-way, natural language speech translation interface.
  • NESPOLE! (EU and NSF): Speech-based E-commerce/service over Telephone in 4 languages.
  • VERBMOBIL (BMBF): Speech-to-Speech Translation in face-to-face situations.

Publications:

Recent Publications:

 

Qin Gao, Stephan Vogel. Training phrase-based machine translation models on the cloud: Open source machine translation toolkit Chaski.  The Prague Bulletin of Mathematical Linguistics No. 93, 2010, pp.37–46.

Nguyen Bach, Roger Hsiao, Matthias Eck, Paisarn Charoenpornsawat, Stephan Vogel, Tanja Schultz, Ian Lane, Alex Waibel, and Alan W.Black.  Incremental Adaptation of Speech-to-Speech Translation.  NAACL HLT 2009. Human Language Technologies: the 2009 Annual Conference of the North American Chapter of the ACL, Short Papers,  Boulder, Colorado, May 31 - June 5, 2009.

Nguyen Bach, Qin Gao, and Stephan Vogel.  Source-Side Dependency Tree Reordering Models and Subtree Movements and Constraints.  MT Summit XII: proceedings of the twelfth Machine Translation Summit, Ottawa, Ontario, Canada, August 26-30, 2009.

Nguyen Bach, Stephan Vogel and Colin Cherry,  Cohesive Constraints in A Beam Search Phrase-based Decoder.  NAACL HLT 2009. Human Language Technologies: the 2009 annual conference of the North American Chapter of the ACL, Boulder, Colorado, May 31 - June 5, 2009.

Francisco Guzman, Qin Gao and Stephan Vogel.  Reassessment of the Role of Phrase Extraction in PBSMT.  MT Summit XII: proceedings of the twelfth Machine Translation Summit,  Ottawa, Ontario, Canada,. August 26-30, 2009.

Almut Silja Hildebrand and Stephan Vogel.  CMU system combination for WMT’09.  Proceedings of the Fourth Workshop on Statistical Machine Translation,  Athens, Greece, 30 March – 31 March 2009.

Narges Sharif Razavian and Stephan Vogel.  The Web as a Platform to Build Machine Translation Resources, International Workshop on Intercultural Collaboration IWIC2008, Stanford, USA February 2009.

Ashish Venugopal, Andreas Zollmann, Noah A.Smith, and Stephan Vogel.  Preference Grammars: Softening Syntactic Constraints to Improve Statistical Machine Translation.  NAACL HLT 2009. Human Language Technologies: the 2009 annual conference of the North American Chapter of the ACL, Boulder, Colorado, May 31 - June 5, 2009.

Nguyen Bach, Qin Gao, & Stephan Vogel: Improving word alignment with language model based confidence scores. ACL-08: HLT. Third Workshop on Statistical Machine Translation, Proceedings, June 19, 2008, The Ohio State University, Columbus, Ohio, USA (ACL WMT-08); pp.151-154.

 

Matthias Eck, Stephan Vogel, & Alex Waibel: Communicating unknown words in machine translation.  LREC 2008: 6th Language Resources and Evaluation Conference, Marrakech, Morocco, 26-30 May 2008; 6pp.

 

Sanjika Hewavitharana and Stephan Vogel:  Enhancing a Statistical Machine Translation System by using an Automatically Extracted Parallel Corpus from Comparable Sources, LREC 2008 Workshop on Comparable Corpora, Marrakech, Morocco, May 2008.

 

Almut Silja Hildebrand & Stephan Vogel: Combination of machine translation systems via hypothesis selection from combined n-best lists. AMTA-2008. MT at work: Proceedings of the Eighth Conference of the Association for Machine Translation in the Americas, Waikiki, Hawai’i, 21-25 October 2008; pp.254-261.

 

Almut Silja Hildebrand, Kay Rottmann, Mohamed Noamany, Qin Gao, Sanjika Hewavitharana, Nguyen Bach, & Stephan Vogel: Recent improvements in the CMU large scale Chinese-English SMT system. ACL-08: HLT. 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Short papers, June 16-17, 2008, The Ohio State University, Columbus, Ohio, USA; pp. 77-80.

 

Jan Niehues & Stephan Vogel: Discriminative word alignment via alignment matrix modelling. ACL-08: HLT. Third Workshop on Statistical Machine Translation, Proceedings, June 19, 2008, The Ohio State University, Columbus, Ohio, USA (ACL WMT-08); pp.18-25.

 

Muntsin Kolss, Stephan Vogel, Alex WaibelSteam Decoding for Simultanous Spoken Language Translation.  Proc of Interspeech 2008.  Brisbane, Australia, 22-26 Sept 2008.

 

Matthias Paulik, Sharath Rao, Ian Lane, Stephan Vogel and Tanja Schultz.  Sentence Segmentation and Punctuation Recovery for SLT.  ICASSP 2008, Las Vegas, April 2008.

 

Tim Schlippe, ThuyLinh Nguyen, & Stephan Vogel: Diacritization as a machine translation problem and as a sequence labeling problem. AMTA-2008. MT at work: Proceedings of the Eighth Conference of the Association for Machine Translation in the Americas, Waikiki, Hawai’i, 21-25 October 2008; pp.270-278.

 

Ashish Venugopal, Andreas Zollmann, Noah A. Smith, & Stephan Vogel: Wider pipelines: n-best alignments and parses in MT training. AMTA-2008. MT at work: Proceedings of the Eighth Conference of the Association for Machine Translation in the Americas, Waikiki, Hawai’i, 21-25 October 2008; pp.192-201.

 

Andreas Zollmann, Ashish Venugopal, & Stephan Vogel: The CMU syntax-augmented machine translation system: SAMT on Hadoop with n-best alignments.  IWSLT 2008: Proceedings of the International Workshop on Spoken Language Translation, 20-21 October 2008, Hawaii, USA; pp. 18-25.

 

 

See also Publications page.


Teaching:

  • 11-731 Machine Translation (with Alon Lavie)
  • 11-732 MT-Lab (with other LTI faculty)
  • 11-733 Multilingual Speech Translation (with Alan Black and Tanja Schulz)
  • 11-734 Advanced Translation Seminar (with Alon Lavie)

Students:

Current PhD and Master Students at LTI, CMU:

 

Former LTI PhD and Master Students I advised or co-advised:

 

  • Qin Gao(MLT, now PhD)
  • Narges Sharif Razavian (MLT, now PhD in computation biology)
  • Alok Parlikar (MLT, now PhD in LTI)
  • Nguyen Bach (MLT, now PhD in LTI)
  • Nimish Gautam (MLT, then Yahoo!)
  • Nguyen Bach (MLT, now PhD in LTI)
  • Sanjika Hewvitharan (MLT, now PhD in LTI)
  • Ashish Venugopal (MLT, then PhD in LTI)

 

I regularly advise and co-advise students from the University of Karlsruhe.  Some of them have come to CMU as InterACT exchange students, to work on a Studienarbeit (3 month research project) or Diplomarbeit (Masters Thesis):

  • Tim Schlippe (Masters Thesis 2008; now PhD student at University of Karlsruhe)
  • Rim Helaoui (Research Project 2007)
  • Matthias Bracht (Research Project 2007)
  • Martin Raab (Masters Thesis 2006; now Harman Becker Automotive Systems and University of Erlangen)
  • Jan Nieheus (Research Project 2006 and Masters Thesis 2007; now PhD student at University of Karlsruhe)
  • Kay Rottmann (Research Project 2006 and Masters Thesis 2007; now Mobile Inc)
  • Silja Hildebrand (Masters Thesis 2005; now PhD student University of Karlsruhe and research staff at CMU)
  • Dietmar Bernreuther (Masters Thesis 2005)
  • Manuel Kauers (Masters Thesis 2002, now RISC-Linz)

 


Links: