My research focuses on cross-lingual methods to improve low-resource NLP, on statistical machine translation of both text and speech, and other cross-lingual and cross-domain statistical NLP tasks.
Prior to joining CMU, I have completed my Master's thesis on Hebrew multiword expressions at the Department of Computer Science, University of Haifa, where I was fortunate to work with Shuly Wintner.
Here is my (outdated) CV and Google Scholar page.
Sparse Overcomplete Word Vector Representations.In Proc. ACL'15. PDF
Constraint-Based Models of Lexical Borrowing.In Proc. NAACL'15. PDF
Identification of Multi-word Expressions by Combining Multiple Linguistic Information Sources.PDFComputational Linguistics, 40(2):449-468, 2014.
Augmenting Translation Models with Simulated Acoustic Confusions for Improved Spoken Language Translation.PDFIn Proc. EACL'14.
Automatic Classification of Communicative Functions of Definiteness.PDFIn Proc. COLING'14.
The CMU Machine Translation Systems at WMT 2014.PDFIn Proc. WMT'14.
Generating English Determiners in Phrase-Based Translation with Synthetic Translation Options.PDFIn Proc. WMT'13.
The CMU Machine Translation Systems at WMT 2013: Syntax, Synthetic Translation Options, and Pseudo-References.PDFIn Proc. WMT'13.
Identifying the L1 of non-native writers: the CMU-Haifa system.PDFIn Proc. the 8th Workshop on Innovative Use of NLP for Building Educational Applications, 2013.
Cross-Lingual Metaphor Detection Using Common Semantic Features.PDFIn Proc. Meta4NLP Workshop, 2013.
Identification and Modeling of Word Fragments in Spontaneous Speech.PDFIn Proc. ICASSP'13.
Extraction of Multi-word Expressions from Small Parallel Corpora.PDFIn Natural Language Engineering 18(4):549-573, 2012.
Identification of Multi-word Expressions by Combining Multiple Linguistic Information Sources.PDFIn Proc. EMNLP'11.
Extraction of Multi-word Expressions from Small Parallel Corpora.PDFUniversity of Haifa M.Sc. thesis, September 2010.
Extraction of Multi-word Expressions from Small Parallel Corpora.PDFIn Proc. COLING'10.
Automatic Acquisition of Parallel Corpora from Websites with Dynamic Content.PDFIn Proc. LREC'10.