I am an assistant professor in the Language Technologies Institute, School of Computer Science at Carnegie Mellon University. My research interests are at or near the intersection of natural language processing, machine learning, linguistics, and social science. Prior to joining LTI, I was a postdoc in the Stanford NLP Group, and before that I got my PhD from CMU.
Here are my CV and Google Scholar page.
Research projects in my group currently focus on language generation (e.g., controllable text generation, continuous-output generation, GANs for text), multilinguality (e.g., open-vocabulary machine translation, polyglot models, entrainment in code-switching), automated negotiation, and NLP for social good (e.g., identification of microaggressions and dehumanization in online interactions, identification of misinformation and agenda-setting in news, preventing scientific misconduct).
Teaching
CMU-01 at the SIGMORPHON 2019 Shared Task on Crosslinguality and Context in MorphologyPDF (Interpretability Prize)
Proc. SIGMORPHON'19.Quantifying Social Biases in Contextual Word RepresentationsPDF
Proc. of Workshop on Gender Bias for NLP.Contextual Affective Analysis: A Case Study of People Portrayals in Online #MeToo StoriesPDF
Proc. ICWSM'19.Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word EmbeddingsPDF
Proc. NAACL'19.Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous OutputsPDF
Proc. ICLR'19.Framing and Agenda-setting in Russian News: a Computational Analysis of Intricate Political StrategiesPDF
Proc. EMNLP'18.Style Transfer Through Back-TranslationPDF
Proc. ACL'18.Native Language Cognate Effects on Second Language Lexical ChoicePDF DATA
Proceedings of the Transactions of Association for Computational Linguistics (TACL). 2018.RtGender: A Corpus for Studying Differential Responses to GenderPDF DATA
Proc. LREC'18.Incorporating Dialectal Variability for Socially Equitable Language IdentificationPDF CODE
Proc. ACL'17.Writer Profiling Without the Writer's TextPDF
Proc. SocInfo'17.Linguistic Knowledge in Data-Driven Natural Language ProcessingPDF
PhD thesis, September 2016.Learning the Curriculum with Bayesian Optimization for Task-Specific Word Representation LearningPDF
Proc. ACL'16.Correlation-based Intrinsic Evaluation of Word Vector RepresentationsPDF CODE
In RepEval'16.Problems With Evaluation of Word Embeddings Using Word Similarity TasksPDF
In RepEval'16.Polyglot Neural Language Models: Case Study in Cross-Lingual Phonetic Representation LearningPDF
Proc. NAACL'16.Morphological Inflection Generation Using Character Sequence to Sequence LearningPDF
Proc. NAACL'16.Massively Multilingual Word Embeddings PDF
arXiv preprintCross-Lingual Bridges with Models of Lexical Borrowing.PDF
Journal of Artificial Intelligence Research (JAIR). 2016.Evaluation of Word Vector Representations by Subspace Alignment.PDF CODE
In Proc. EMNLP'15.Not All Contexts Are Created Equal: Better Word Representations with Variable Attention.PDF
In Proc. EMNLP'15.Lexicon Stratification for Translating Out-of-Vocabulary Words.PDF
In Proc. ACL'15.Sparse Overcomplete Word Vector Representations.PDF
In Proc. ACL'15.A Bottom Up Approach to Category Mapping and Meaning Change.PDF
In Proc. NetWordS'15.Constraint-Based Models of Lexical Borrowing.PDF
In Proc. NAACL'15.Identification of Multi-word Expressions by Combining Multiple Linguistic Information Sources.PDF
Computational Linguistics, 40(2):449-468, 2014.Metaphor Detection with Cross-Lingual Model Transfer.PDF CODE DATA
In Proc. ACL'14.Augmenting Translation Models with Simulated Acoustic Confusions for Improved Spoken Language Translation.PDF
In Proc. EACL'14.Augmenting English Adjective Senses with Supersenses.PDF CODE DATA
In Proc. LREC'14.Unified Annotation Scheme for the Semantic/Pragmatic Components of Definiteness.PDF DATA
In Proc. LREC'14.Automatic Classification of Communicative Functions of Definiteness.PDF
In Proc. COLING'14.The CMU Machine Translation Systems at WMT 2014.PDF
In Proc. WMT'14.Generating English Determiners in Phrase-Based Translation with Synthetic Translation Options.PDF
In Proc. WMT'13.The CMU Machine Translation Systems at WMT 2013: Syntax, Synthetic Translation Options, and Pseudo-References.PDF
In Proc. WMT'13.Identifying the L1 of non-native writers: the CMU-Haifa system.PDF
In Proc. the 8th Workshop on Innovative Use of NLP for Building Educational Applications, 2013.Cross-Lingual Metaphor Detection Using Common Semantic Features.PDF
In Proc. Meta4NLP Workshop, 2013.Identification and Modeling of Word Fragments in Spontaneous Speech.PDF
In Proc. ICASSP'13.Extraction of Multi-word Expressions from Small Parallel Corpora.PDF
In Natural Language Engineering 18(4):549-573, 2012.Identification of Multi-word Expressions by Combining Multiple Linguistic Information Sources.PDF
In Proc. EMNLP'11.Extraction of Multi-word Expressions from Small Parallel Corpora.PDF
University of Haifa M.Sc. thesis, September 2010.Extraction of Multi-word Expressions from Small Parallel Corpora.PDF
In Proc. COLING'10.Automatic Acquisition of Parallel Corpora from Websites with Dynamic Content.PDF
In Proc. LREC'10.