Nguyen Bach (Bạch Hưng
Nguyên)
Graduate
Research Assistant
Home
| Experience | Miscellaneous | Personal Info
- Machine Translation
- Natural language processing
- Speech recognition and synthesis
- Machine learning
- Information Retrieval
I am a PhD student working under the supervision of Prof. Alex Waibel
and Prof. Stephan Vogel at CMU
Statistical Machine Translation group. My current research is preserving relations cross languages via translation from text and speech. I am working on
GALE and TransTac projects.
- Nguyen Bach, Qin Gao, and Stephan Vogel, 'Improving Word Alignment with Language Model Based Confidence Scores,' Proceeding of ACL-08:HLT, WSMT, June 2008, Columbus, Ohio, USA. [PDF], [Slides], [Bib], [MGIZA++]
- Almut Silja Hildebrand, Kay Rottmann, Mohamed Noamany, Qin Gao, Sanjika Hewavitharana, Nguyen Bach, and Stephan Vogel, 'Recent Improvements in the CMU Large Scale Chinese-English SMT System,' Proceeding of ACL-08:HLT, June 2008, Columbus, Ohio, USA. [PDF], [Slides], [Bib]
- Ian Lane, Andreas Zollmann, ThuyLinh Nguyen, Nguyen Bach, Ashish Venugopal, Stephan Vogel, Kay Rottmann, Ying Zhang, and Alex Waibel, 'The CMU-UKA Statistical Machine Translation Systems for IWSLT 2007,' Proceeding of IWSLT 2007, October 2007, Trento, Italy. [PDF], [Slides], [Bib].
- Nguyen Bach, Matthias Eck, Paisarn Charoenpornsawat, Thilo Kohler, Sebastian Stuker, ThuyLinh Nguyen, Roger Hsiao, Alex Waibel, Stephan Vogel, Tanja Schultz, and Alan Black, 'The CMU TransTac 2007 Eyes-free and Hands-free Two-way Speech-to-Speech Translation System,' Proceeding of IWSLT 2007, October 2007, Trento, Italy. [PDF], [Slides], [Bib].
- Nguyen Bach, Mohamed Noamany, Ian Lane, and Tanja Schultz, 'Handling OOV Words in Arabic ASR Via Flexible Morphological Constraints,' Proceeding of INTERSPEECH 2007, August 2007, Antwerp, Belgium. [PDF],[Slides],[Bib]
- Bing Zhao, Nguyen Bach, Ian Lane, and Stephan Vogel, 'A Log-linear Block Transliteration Model based on Bi-Stream HMMs,' Proceeding of HLT-NAACL 2007, pp 364-371, April 2007, Rochester, NY, USA. [PDF],[Slides],[Bib],[Test set]
+ Earlier version: CMU-LTI Technical Report, CMU-LTI-06-007, Fall 2006.
- Matthias Eck, Ian Lane, Nguyen Bach, Sanjika Hewavitharana, Muntsin Kolss, Bing Zhao, Almut Silja Hildebrand, Stephan Vogel, and Alex Waibel, 'The UKA/CMU Statistical Machine Translation System for IWSLT 2006,' Proceeding of IWSLT 2006, pp 130-137, November 2006, Kyoto, Japan. [PDF], [Slides], [Bib]
- Hansjoerg Mixdorff, Nguyen Hung Bach, Hiroya Fujisaki, and Mai Chi Luong, 'Quantitative Analysis and Synthesis of Syllabic Tones in Vietnamese,' Proceeding of EUROSPEECH 2003, pp 177 - 180, Sep 2003, Geneva, Switzerland. [PDF]
- Bach Hung Nguyen, and Nguyen Tien Dung, 'Analysis F0 Contours Using the Fujisaki model for Vietnamese Tones,' National Informatics Conference, Thai Nguyen, Vietnam, 2003, [PDF(Vietnamese)]
- Bach Hung Nguyen, and Luong Chi Mai, 'Application of Dynamic Time
Warping Algorithm for the Recognition of Vietnamese Isolated Words,'
National Informatics Conference, Hanoi, Vietnam,
Dec 2001, pp 465 - 473. [PDF(Vietnamese)]
|
IMPLEMENTATIONS
– UNPUBLISHED REPORTS
|
- Translate
Arabic OOV words by Transformation Transliteration Rules , March 2006,
Carnegie Mellon University
[available inside CMU]
- N. Bach, ' MetaShopper - a
preliminary study and implementation ', May 2004, Johns Hopkins
University
You can try the implementation here VeryNaiveBookCrawler
- N. Bach, S. Reddy, 'A
preliminary quantitative study on the characteristics of Vietnamese vowels
and English vowels', May 2004, Johns Hopkins University
- A random sentence generator. Each time you run
the generator; it reads the context-free grammar from a file and prints
one or more random sentences. This small program was done in September
2003 and updated June 2004. You can try it here: 10
English sentences or 10
Vietnamese sentences with Nguyen_Binh's style
- A text classifier. The program uses 2 training
corpora. They can be spam and not-spam or English and Spanish. Given an
email the program classifies it to a training group. So for spam detector,
the email is determined whether it is spam or not-spam. For language
identification, the email is determined whether it is written in English
or Spanish. By using smoothing techniques the error rate sharply
decreases. I tried uniform, add-lambda, add-lambda backoff, and
Witten-Bell backoff.
Nguyen Bach
Last
modified: June 9, 2008