driving a boat in the red sea
Carnegie Mellon University
School of Computer Science
Language Technologies Institute
Waleed Ammar

I'm a PhD candidate at Carnegie Mellon University, and a Google PhD fellow. I develop methods for processing natural languages in low-resource scenarios. In particular, I'm interested in statistical methods for cross-lingual, semi-supervised, and unsupervised learning in the NLP domain. NLP problems I've worked on include morphological analysis, part-of-speech tagging, depenency parsing, machine translation, transliteration, word alignment, reordering, language identification in code-switched text, and identifying selectional preferences. I'm also a mature software engineer with five years of professional experience.

  • Google PhD Fellowship in Natural Language Processing
  • Two Technology Transfer awards at Microsoft Research
  • IBM PhD Fellowship (selected but not given the award due to a conflict with the Google fellowship)

Publications [gscholar]
Many Languages, One Parser [pdf]
Waleed Ammar, George Mulcaire, Miguel Ballesteros, Chris Dyer, Noah A. Smith.
TACL 2016.

Massively Multilingual Word Embeddings [pdf]
Waleed Ammar, George Mulcaire, Yulia Tsvetkov, Guillaume Lample, Chris Dyer, Noah A. Smith.
(under submission).

Unsupervised POS Induction with Word Embeddings [pdf]
Chu-Cheng Lin, Waleed Ammar, Lori Levin, Chris Dyer.
NAACL 2015.

Model Selection for Type-Supervised Learning with Application to POS Tagging [pdf]
Kristina Toutanova, Waleed Ammar, Pallavi Choudhury, Hoifung Poon.
CoNLL 2015.

Constraint-Based Models of Lexical Borrowing [pdf]
Yulia Tsvetkov, Waleed Ammar, Chris Dyer.
NAACL 2015.

Conditional Random Field Autoencoders for Unsupervised Structured Prediction [pdf, talk]
Waleed Ammar, Chris Dyer, Noah Smith.
NIPS 2014.

Transliteration by Sequence Labeling with Lattice Encoding and Reranking [pdf]
Waleed Ammar, Chris Dyer, Noah Smith.
NEWS workshop at ACL 2012.

Improved Transliteration Mining Using Graph Reinforcement [
Ali El Kahki, Kareem Darwish, Ahmed Saad El Din, Mohamed Abd El-Wahab, Ahmed Hefny and Waleed Ammar.
EMNLP 2011.

ICE-TEA: In-Context Expansion and Translation of English Abbreviations [pdf]
Waleed Ammar, Kareem Darwish, Ali ElKahki and Khaled Hafez.

Automatic scoring of online discussion posts [pdf]
Nayer Wanas, Motaz El Saban, Heba Ashour and Waleed Ammar.
WICOW workshop at CIKM 2008.


Syntax-based Augmentation of Statistical Machine Translation Phrase Tables
Achraf Chalabi, Waleed Ammar, Mostafa Ashour.
US Patent, Publication No. US 2012/0296633.

User evaluation in a collaborative online forum
Nayer Wanas, Heba Ashour, Moustafa El-Baradei, Ahmed Morsy, Motaz El Saban and Waleed Ammar.
US patent, Publication No. US 2010/0162135 A1.

Non-refereed Papers

The CMU Submission for the Shared Task on Language Identification in Code Switched Data [pdf]
Chu-Cheng Lin, Waleed Ammar, Chris Dyer and Lori Levin.
Code Switching Workshop at EMNLP 2014.

The CMU Machine Translation Systems at WMT 2014 [pdf]
Austin Matthews, Waleed Ammar, Archna Batia, Weston Feely, Greg Hanneman, Eva Schlinger, Swabha Swayampidta, Yulia Tsvetkov, Alon Lavie, Chris Dyer.
WMT workshop at ACL 2014.

The CMU Machine Translation Systems at WMT 2013: Syntax, Synthetic Translation Options, and Pseudo-References [pdf]
Waleed Ammar, Victor Chahuneau, Michael Denkowski, Greg Hanneman, Wang Ling, Austin Matthews, Kenton Murray, Nicola Segall, Yulia Tsvetkov, Alon Lavie, Chris Dyer.
WMT workshop at ACL 2013.

Automatic Categorization of Privacy Policies [pdf]
Waleed Ammar, Shomir Wilson, Norman Sadeh, Noah Smith.
Tech Report 2012.

Secure localization in wireless sensor networks: a survey [pdf]
Waleed Ammar, Ahmed ElDawy and Moustafa Youssef.
arXiv 2010.

Professional Experience

Google – Pittsburgh
Software Engineering Intern (Sep 2014 – Dec 2014)
Explored novel methods for large-scale online training of decision forests.

Microsoft Research – Redmond
Research Intern (May 2013 – Aug 2013)
Explored novel methods for optimization and model selection of unsupervised and semi-supervised learning with lexical constraints.

Microsoft Research – Redmond
Software Development Engineer II (Dec 2010 – Aug 2011)
Identified deficiencies of machine translated text and worked with researchers of the NLP group to find solutions. I was also responsible for integration of such solutions into the production system.

Microsoft Research – Microsoft Innovation Laboratory in Cairo
Research Software Development Engineer (Nov 2007 – Nov 2010)
Collaborated with researchers in MSR to push state of the art in the fields of Data Mining and Natural Language Processing by engineering prototype technologies, writing papers and formulating patents. I was also responsible for the transfer of research prototypes into Microsoft products.

Alexandria University
Teaching Assistant (Aug 2007 – Nov 2007)
Tutored students, held office hours, graded homework and mid-term exams, administrated tests and exams, and assisted professors with laboratory sessions.
Courses: Probability and Statistics I, Technical Writing I, and Introduction to Computers.

eSpace Technologies
Part-Time Software Developer (Jul 2007 – Nov 2007)
My role encompassed design and development of features in web portals as well as identification and resolution of deficiencies in web applications. I also took part in collecting customer requirements.

IBM Egypt – Cairo Technology Development Center
Intern at Human Language Technologies Group (Jul 2006 – Aug 2006)
Participated in TREC 2006 genomics track competition. We developed an information retrieval (IR) system capable of answering specific types of questions from within biological documents.

Procter & Gamble (P&G)
Intern on Project Management (Jun 2005 – Aug 2005)
Managed a real-world automation project at P&G powder factory in Egypt. Project scope included automatic identification of objects, semi-automatic acquisition of product type information, and rich web reporting system.

Academic Services
  • Serving on the program committee of ACL 2016.
  • Serving on the program committee of NAACL-HLT 2016.
  • Serving on the program committee of the NAACL-HLT 2016 workshop on multilingual and crosslingual methods in NLP.
  • Reviewed for Journal of Artificial Intelligence Research.
  • Served on the program committee of EMNLP 2015.
  • Served on the program committee of IJCAI 2015.
  • Served on the program committee of the NAACL 2015 workshop on vector space modeling for NLP.
  • Helped write a proposal for a multi-million-dollars multi-university NSF project on making privacy policies more usable.
  • Was the PhD student body representative of LTI-CMU 2013.


Activities Log

  • Noah A. Smith --University of Washington (PhD advisor)
  • Chris Dyer --Carnegie Mellon University (PhD advisor)
  • Tom Mitchell --Carnegie Mellon University (PhD thesis committee)
  • Kuzman Ganchev --Google Research (PhD thesis committee)
  • Miguel Ballesteros --Carnegie Mellon University (co-author, collaborator on cross-lingual parsing)
  • D. Sculley --Google Research (internship host)
  • Kristina Toutanova --Microsoft Research (co-author, internship host)
  • Kareem Darwish --Qatar Computing Research Institute (co-author, ex-manager)
  • Ayman Kaheel --Yahoo Inc. (ex-manager)
  • Tarek Elabbady --Microsoft Research (ex-manager)
  • Mei-Yuh Hwang --Microsoft Research (ex-manager)
  • Yulia Tsvetkov --Carnegie Mellon University (co-author, colleague)
  • Ahmed Hefny --Carnegie Mellon University (co-author, colleague)
  • Ali ElKahki --Google Research (ex-colleague at MSR Cairo)
  • Chu-Cheng Lin --Carnegie Mellon University (co-author, collaborator on modeling code switching with CRF autoencoders)
  • George Mulcaire --University of Washington (collaborator on estimating multilingual word embeddings)
  • Pradeep Dasigi --Carnegie Mellon University (collaborator on modeling selectional preferences with CRF autoencoders)
  • Moustafa Youssef --Egypt-Japan University of Science and Technology (co-author, M.Sc. ex-advisor)
  • Jeffrey Micher --US Army Research Lab (collaborator on the low-density MT project)
  • Norman Sadeh --Carnegie Mellon University (lead principal investigator of the usable privacy policy project)
  • George Foster --National Research Council Canada (Google fellowship research mentor)
  • Lori Levin --Carnegie Mellon University (co-author)
  • Jaime Carbonell --Carnegie Mellon University (department head, lead principal investigator of the low-density MT project)

Recent Projects
  • Language-universal dependency parsing* (code).
  • CRF autoencoder models for Scalable and feature-rich unsupervised learning* (code).
  • Multilingual word embeddings (unification-based*).
  • A universal depenency treebanks analyzer* (code).
  • Large-scale online training of random forests.*
  • Bayesian models for record linkage* (code).
  • CRF model for transliteration* (code).
  • Dual decomposition of a CFG parser and a POS tagger* (code)
  • A bunch of handy C, C++ and python utilities* (code).
  • Privacy policy crawler* (code).
  • C++ library for training recurrent neural network (code).
  • A neural network model which generalizes CRF autoencoders, for modeling selectional preferences (code).
  • A computational model for linguistic borrowing (code).
  • Semi-supervised learning for token-level language identification. (task, Twitter results, surprise genre results)
  • Improved training and model selection of unsupervised sequence-labeling models with lexical constraints.
  • Yet another implementation of the dependency parsing with DMV* (code).
  • Yet another implementation of logistic regression* (code).
  • Yet another implementation of word-alignment induced preordering for machine translation* (code).
Projects led by me are marked with *

PhD Courses
  • Courses TA'ed:
    • Machine Learning --Roni Rosenfeld
    • Structured Prediction --Noah Smith and Chris Dyer
  • Courses taken:
    • Intermediate Statistics --Larry Wasserman
    • Machine Learning --Ziv Bar Joseph
    • Convex Optimization --Ryan Tibshirani
    • Algorithms for NLP --Noah Smith and Alon Lavie
    • NLP lab --Alon Lavie
    • Machine Translation --Chris Dyer and Alon Lavie
    • Grammars and Lexicons --Lori Levin
    • Linguistics lab --Lori Levin
    • Advanced Machine Translation Seminar --Chris Dyer and Alon Lavie
    • Advanced NLP seminar --Noah Smith
    • Entrepreneurship for High Growth Companies --Arthur Boni
    • Beginning Tennis --Sarah Short