Nguyen Bach (Bạch Hưng
Nguyên)
Graduate
Research Assistant
Home
| Experience | Miscellaneous | Personal Info
- Machine Translation
- Natural language processing
- Speech recognition and synthesis
- Machine learning
- Information Retrieval
I am a PhD student working under the supervision of Prof. Alex Waibel
and Prof. Stephan Vogel at CMU
Statistical Machine Translation group. My current research is exploiting dependency structures from text and speech for statistical machine translation. I am working on GALE and TransTac projects.
2009
- Source-side Dependency Tree Reordering Models with Subtree Movements and Constraints
Nguyen Bach, Qin Gao, Stephan Vogel
In Proceedings of the 12th Machine Translation Summit (MT Summit XII), August 2009, Ottawa, Ontario, Canada.
[Slides], [Bib], to appear.
- Cohesive Constraints in A Beam Search Phrase-based Decoder
Nguyen Bach, Stephan Vogel and Colin Cherry
In Proceedings of the North American Association for Computational Linguistics Human Language Technologies Conference (NAACL-HLT 2009), Boulder, CO, May/June 2009, USA.
[Slides], [Bib].
- Incremental Adaptation of Speech-to-Speech Translation
Nguyen Bach, Roger Hsiao, Matthias Eck, Paisarn Charoenpornsawat, Stephan Vogel, Tanja Schultz, Ian Lane, Alex Waibel and Alan W. Black
In Proceedings of the North American Association for Computational Linguistics Human Language Technologies Conference (NAACL-HLT 2009), Boulder, CO, May/June 2009, USA.
[Slides], [Bib].
2008
- Improving Word Alignment with Language Model Based Confidence Scores
Nguyen Bach, Qin Gao, Stephan Vogel
In Proceedings of the ACL 2008 Third Workshop on Statistical Machine Translation (ACL-08:HLT, WSMT), June 2008, Columbus, Ohio, USA.
[Slides], [Bib], [MGIZA++].
- Recent Improvements in the CMU Large Scale Chinese-English SMT System
Almut Silja Hildebrand, Kay Rottmann, Mohamed Noamany, Qin Gao, Sanjika Hewavitharana, Nguyen Bach and Stephan Vogel
In Proceedings of the Annual Meeting of the Association for Computational Linguistics with the Human Language Technology Conference (ACL-08:HLT), June 2008, Columbus, Ohio, USA.
[Slides], [Bib]
2007
- A Log-linear Block Transliteration Model based on Bi-Stream HMMs
Bing Zhao, Nguyen Bach, Ian Lane and Stephan Vogel
In Proceedings of the North American Association for Computational Linguistics Human Language Technologies Conference (NAACL-HLT 2007), pp 364-371, April 2007, Rochester, NY, USA.
[Slides],[Bib],[Test set].
- The CMU TransTac 2007 Eyes-free and Hands-free Two-way Speech-to-Speech Translation System
Nguyen Bach, Matthias Eck, Paisarn Charoenpornsawat, Thilo Kohler, Sebastian Stuker, ThuyLinh Nguyen, Roger Hsiao, Alex Waibel, Stephan Vogel, Tanja Schultz and Alan W. Black
In Proceedings of the International Workshop on Spoken Language Translation(IWSLT-2007), October 2007, Trento, Italy.
[Slides], [Bib].
- Handling OOV Words in Arabic ASR Via Flexible Morphological Constraints
Nguyen Bach, Mohamed Noamany, Ian Lane and Tanja Schultz
In Proceedings of the INTERSPEECH (Interspeech-2007), August 2007, Antwerp, Belgium.
[Slides],[Bib].
- The CMU-UKA Statistical Machine Translation Systems for IWSLT 2007
Ian Lane, Andreas Zollmann, ThuyLinh Nguyen, Nguyen Bach, Ashish Venugopal, Stephan Vogel, Kay Rottmann, Ying Zhang and Alex Waibel
In Proceedings of the International Workshop on Spoken Language Translation (IWSLT-2007), October 2007, Trento, Italy.
[Slides], [Bib].
2006
- A Log-linear Block Transliteration Model based on Bi-Stream HMMs
Bing Zhao, Nguyen Bach, Ian Lane and Stephan Vogel
T.R. CMU-LTI-06-007, Carnegie Mellon University, Pittsburgh, PA, Fall 2006.
- The UKA/CMU Statistical Machine Translation System for IWSLT 2006
Matthias Eck, Ian Lane, Nguyen Bach, Sanjika Hewavitharana, Muntsin Kolss, Bing Zhao, Almut Silja Hildebrand, Stephan Vogel and Alex Waibel
In Proceedings of the International Workshop on Spoken Language Translation (IWSLT-2006), pp 130-137, November 2006, Kyoto, Japan.
[Slides], [Bib]
Before 2005
- Quantitative Analysis and Synthesis of Syllabic Tones in Vietnamese
Hansjoerg Mixdorff, Nguyen Hung Bach, Hiroya Fujisaki and Mai Chi Luong
In Proceedings of the EUROSPEECH (Eurospeech-2003), pp 177 - 180, Sep 2003, Geneva, Switzerland.
- Analysis F0 Contours Using the Fujisaki model for Vietnamese Tones
Bach Hung Nguyen and Nguyen Tien Dung
In Proceedings of the National Informatics Conference, Thai Nguyen, Vietnam, 2003.
- Application of Dynamic Time Warping Algorithm for the Recognition of Vietnamese Isolated Words
Bach Hung Nguyen and Luong Chi Mai
In Journal of Science and Technology, N.5, 2002, Vietnam.
- Qin Gao, Alok Parlikar, Nguyen Bach, and Stephan Vogel, 'Statistical Machine Translation: Parallel Processing for Large Data Situations,' Intel Research Pittsburgh Open House 2008, October 2008, Pittsburgh, PA, USA. [PDF]
- Simulating Sentence Pairs Sampling Process via Source and Target Language Models, MT Lunch, April 2008, Carnegie Mellon University
- Translating Words
You've Never Seen , Student Research Symposium 2006, Language
Technologies Institute, Carnegie
Mellon University
|
IMPLEMENTATIONS
– UNPUBLISHED REPORTS
|
- Translate
Arabic OOV words by Transformation Transliteration Rules , March 2006,
Carnegie Mellon University
[available inside CMU]
- N. Bach, ' MetaShopper - a
preliminary study and implementation ', May 2004, Johns Hopkins
University
You can try the implementation here VeryNaiveBookCrawler
- N. Bach, S. Reddy, 'A
preliminary quantitative study on the characteristics of Vietnamese vowels
and English vowels', May 2004, Johns Hopkins University
- A random sentence generator. Each time you run
the generator; it reads the context-free grammar from a file and prints
one or more random sentences. This small program was done in September
2003 and updated June 2004. You can try it here: 10
English sentences or 10
Vietnamese sentences with Nguyen_Binh's style
- A text classifier. The program uses 2 training
corpora. They can be spam and not-spam or English and Spanish. Given an
email the program classifies it to a training group. So for spam detector,
the email is determined whether it is spam or not-spam. For language
identification, the email is determined whether it is written in English
or Spanish. By using smoothing techniques the error rate sharply
decreases. I tried uniform, add-lambda, add-lambda backoff, and
Witten-Bell backoff.
Nguyen Bach
Last
modified: April 1, 2009