Bing Zhao's C.V.

Bing Zhao

Language Technologies Institute
School of Computer Science
Carnegie Mellon University
Pittsburgh, PA 15213
(412) 320-0377 (Cell)
(412) 268-4546 (Office)

bzhao [at] CS [dot] CMU [dot] EDU
http://www.cs.cmu.edu/~bzhao

EDUCATION

Ph.D. in Language Technologies, Computer Science, Carnegie Mellon University, 2007
"Statistical Alignment Models for Translational Equivalence"; M.S. in Language Technologies, in May 2003.
Advisors: Alex Waibel, Eric P. Xing and Stephan Vogel.
M.S. in Pattern Recognition and AI, Institute of Automation, Chinese Academy of Sciences, July 2001
"A Continuous Chinese Digit Speech Recognition System: Acoustic Modeling, Speaker Adaptation, and Decoding”,
Advisors: Taiyi Huang and Bo Xu.
B.S. in Electronic Engineering, University of Science and Technology of China , July 1998
"A Wavelet Transformation based Compression Algorithm for Seismogram",
Advisor: Zhengkai Liu

RESEARCH INTERESTS

Statistical Machine Translation, Speech Recognition, Information Extraction, Text Mining, Machine Learning and Natural Language Processing.

SKILLS SUMMARY

Strong background in statistical machine translation, machine learning, pattern recognition, numerical optimization, probabilistic inference and statistical analysis.
Proficient in numerical algorithm programming and in C++, C, Matlab and perl.
Independent, and creative thinker.

PROFESSIONAL EXPERIENCE

2001/8–current Research Assistant, Language Technologies Institute, School of Computer Science, CMU
Statistical Alignment Models for Translational Equivalence:

§ Robust statistical alignment models of for machine translation;

§ Bilingual Topical AdMixture models for machine translation;

§ Key player in CMU-team for projects including GALE and TIDES
2004/9–2004/12 Research Intern, IBM T.J. Watson Research, USA
Intern at IBM, supervised by Dr. Kishore Papineni and Dr. Niyu Ge, working on:

§ Inner-outer bracket models for word alignment;

§ Detailed experiments comparing different alignment approaches.
1998/8–2001/7    Research Assistant,    Chinese Academy of Sciences, China
Graduate research assistant at National Laboratory of Pattern Recognition.

§ Mandarin continuous digits recognition system with Nokia;

§  Acoustic Model Adaptation: several MLLR based algorithms; MRF based MAP, SMAP;

§  Chinese Trigram Language Modeling; Chinese pinyin to character conversion.
2001/4–2001/7    Visiting Student,    Microsoft Research Asia, China
Visiting student at MSRA, supervised by Eric Chang, working on:
§  Discriminative training for large scale continuous speech recognition;
§  Discriminative training for vowel recognition;
1993/9–1998/7 Research Assistant, Univ. of Science & Technology of China
Undergraduate research assistant at Information Processing Center of USTC.

TEACHING EXPERIENCE

Teaching Assistant:
§ 2005, CMU undergraduate course 15-381, Artificial Intelligence
Assisted with the design and grading of homework, exams, and projects. Held office hours.

§ 2005, CMU graduate course 11-751, Speech Recognition and Understanding
Design and grading of homework, exams, and projects. Held office hours.

PUBLICATIONS

Refereed Papers:

        [1] Bing Zhao, Nguyen Bach, Ian Lane, and Stephan Vogel, "A Log-linear Block Transliteration Model based on Bi-Stream HMMs", to appear in HLT/NAACL-2007.

        [2] Bing Zhao and Eric P. Xing, “BiTam: Bilingual Topic AdMixture Models for Word Alignment”, in the proceedings of Joint Conference of Computational Linguists and Meeting of Association for Computational Linguists (ACL/Coling 2006), July, 2006.

        [3] Muntsin Kolss, Bing Zhao, Stephan Vogel, Ashish Venugopal, and Ying Zhang, “The ISL Statistical Machine Translation System for the TC-STAR Spring 2006 Evaluation”, TC-Star Workshop on Speech-to-Speech Translation, TC-STAR-WS 2006, Barcelona, Spain, 2006.

        [4] Matthias Eck, Ian Lane, Nguyen Bach, Sanjika Hewavitharana, Muntsin Kolss, Bing Zhao, Almut Silja Hildebrand, Stephan Vogel and Alex Waibel, “The UKA/CMU Statistical Machine Translation System for IWSLT 2006”, in the proceedings of IWSLT 2006.

        [5] Sanjika Hewavitharana, Bing Zhao, Almut Silja Hildebrand, Matthias Eck, Chiori Hori, Stephan Vogel and Alex Waibel, “ The CMU Statistical Machine Translation System for IWSLT2005”, in the proceedings of International Workshop on Spoken Language Translation (IWSLT 2005), Sept. 2005..

        [6] Bing Zhao and Alex Waibel, “Learning a Log-Linear Model with Bilingual Phrase-Pair Features for Statistical Machine Translation”, in the proceedings of Fourth SigHan workshop on Chinese Language Processing (SigHan 2005), October, 2005.

        [7] Bing Zhao, Niyu Ge, and Kishore Papineni, “Inner-Outer Bracket Models for Word Alignment using Hidden Blocks”, in the proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP 2005), Oct. 2005.

        [8] Bing Zhao and Stephan Vogel, “A Generalized Alignment-Free Phrase Extraction”, in the proceeding of ACL 2005 Workshop on Building and using Parallel Texts: Data Driven Machine Translation and Beyond (ACL WPT-05), June 2005.

        [9] Bing Zhao, Eric P. Xing, and Alex Waibel, “Bilingual Word Spectral Clustering for Statistical Machine Translation”, in the proceeding of ACL 2005 Workshop on Building and using Parallel Texts: Data Driven Machine Translation and Beyond (ACL WPT-05), June 2005.

      [10] Bing Zhao, Stephan Vogel, and Alex Waibel, “Phrase Pair Rescoring with Term Weightings for Statistical Machine Translation”, in the proceeding of Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), July 2004.

      [11] Bing Zhao, Matthias Eck, and Stephan Vogel, “Language Model Adaptation for Statistical Machine Translation with Structured Query Models”, in the proceeding of The 20th International Conference on Computational Linguistics (Coling 2004), Aug. 2004.

      [12] Bing Zhao, Klaus Zechner, Stephan Vogel, and Alex Waibel, “Efficient Optimization for Bilingual Sentence Alignment based on Linear Regression”, in the proceeding of HLT/NAACL 2003 Workshop on Building and using Parallel Texts: Data Driven Machine Translation and Beyond, May, 2003.

      [13] Bing Zhao and Stephan Vogel, “Word Alignment Based on Bilingual Bracketing”, in the proceeding of HLT/NAACL 2003 Workshop on Building and using Parallel Texts: Data Driven Machine Translation and Beyond (HLT/NAACL WPT-03), May, 2003.

      [14] Stephan Vogel, Ying Zhang, Alicia Tribble, Fei Huang, Ashish Venugopal, Bing Zhao, and Alex Waibel. "The CMU Statistical Translation System." in Proceedings of the MT Summit IX. New Orleans, LA. September 2003..

      [15] Ying Zhang, Bing Zhao, Jie Yang, and Alex Waibel, “Automatic SIGN Translation”, in the proceeding of International Conference on Spoken Language Processing (ICSLP2002), Aug. 2002.

      [16] Bing Zhao and Stephan Vogel, “Full-text Story Alignment Models for Chinese-English Bilingual News Corpora”, in the proceeding of International Conference on Spoken Language Processing (ICSLP2002), 2002

      [17] Bing Zhao and Stephan Vogel, “ Adaptive Parallel Sentences Mining From Web Bilingual News Collection”, in the proceeding of the 2002 IEEE International Conference on Data Mining (ICDM 2002), December 2002.

      [18] Yun Zhou, Chengqing Zong, and Bing Zhao, The corpus oriented analysis of Chinese spoken dialog understanding”, in the proceeding of International Symposium of Chinese Spoken Language Processing (ISCSLP 2000), July, 2000.

      [19] Bing Zhao and Bo Xu, MLLR Speaker Adaptation using Acoustic Correlation Information”, in the proceeding of The National Conference on Man-Machine Speech Communication (NCMMSC 2000), 2000.

      [20] Sheng Gao, Bo Xu, Hong Zhang, Bing Zhao, Chengrong Li and Taiyi Huang,Updated Progress of SINOHEAR: Advanced Mandarin LVCSR System at NLPR”, in the proceeding of International Conference on Spoken Language Processing (ICSLP 2000), 2000.

      [21] Bing Zhao and Bo Xu, Incorporating HMM State Sequence Confusion for Rapid MLLR Adaptation to New Speakers”, in the proceeding of International Conference on Spoken Language Processing (ICSLP 2000), 2000.

Technical Report:

      [22] Bing Zhao, Nguyen Bach, Ian Lane, and Stephan Vogel, A Log-linear Block Transliteration Model based on Bi-Stream HMMs”, CMU-LTI-06-007, Technical Report 2006 Fall (Conference version in HLT/NAACL-07).

Thesis:

      [23] Bing Zhao, “Statistical Alignment Models for Translational Equivalence”, Expected in May 2007

SELECTED HONORS AND AWARDS

2001-present      Computer science graduate fellowship, Carnegie Mellon University.
1999                  Tung's Oriental Scholarship, Chinese Academy of Sciences.
1998                  Excellent Bachelor Thesis Award, University of Science and Technology of China (USTC)
1995-1997         "Yu-Cai", “P&G” and “Ding Xin” Scholarships at USTC.
1993-1994         Excellent Student Scholarships at USTC.

RESEARCH

Machine Translation
2002–	Statistical Alignment Models for Translational Equivalence	CMU
	In my current work, I focus on bilingual topic AdMixture (BiTAM) translation models leveraging bilingual document-level context. The parallel sentence-pairs within a document-pair are assumed to constitute a mixture of hidden topics; each observed word-pair follows a topic-specific translation lexicon. With such topical information inferred from document level context, the translation models are expected to be sharper and the word alignment process less ambiguous. Traditional IBM translation models are word-mixture models, which simply ignore the parallel document boundaries in their generative processes. I proposed a novel formalism for BiTAM translation models, and the traditional IBM models can be easily integrated within this formalism to formulate new models. Three such BiTAMs with embedded IBM Model-1 were tested extensively; an extended hidden Markov bilingual topic AdMixture model was developed and evaluated on both word alignment accuracy and translation quality, showing improved modeling power. Currently, I am applying the topic-specific translation lexicons learnt from the proposed BiTAM models: 1) leveraging the topical hidden information during decoding; 2) efficient variational inference algorithms to scale to large size of training data; 3) extensions to other alignment models such as BiTAM IBM-3, in which a fertility table is estimated for better word alignment.
2001 –	Hands-on Experience in Statistical Machine Translation	CMU
	I have about 5-year experiences covering many aspects of statistical machine translation. I designed and implemented cross-lingual bag-of-word models to align parallel documents from comparable data, and a dynamic programming algorithm to extract parallel sentence-pairs from the aligned document pairs. I applied these techniques to align the 10-year XinHua news comparable corpora and generated the collection released under LDC2002E18. I implemented a SMT beam search decoder. I also designed models for transliterations of Arabic unknown words. I directly participated in a number of international machine translation evaluations including GALE, NIST, IWSLT and TC-STAR. Part of my work at CMU involves building language models (LM) for statistical machine translation. This includes extending the implementation of a suffix array LM, LM interface within a translation decoder, and detailed experiments of comparing cluster-LM, POS-LM, and the handling of UNK word for statistical machine translation. To improve LM's effectiveness, I proposed an effective sentence-level LM adaptation with structured query models. Sentence-specific LMs were interpolated with a background LM to re-decode each test sentence. Experiments showed improved perplexity, and translation qualities in terms of BLEU and NIST scores.

REFERENCES

Available upon request