John Wieting

Contact

About

I am a PhD student at Carnegie Mellon University supervised by Taylor Berg-Kirkpatrick and Graham Neubig. I also collaborate with Kevin Gimpel at the Toyota Technological Institute at the University of Chicago. Previously, I did my MS with Dan Roth, currently at the University of Pennsylvania.

My interests lie in machine learning, learning theory, optimization, natural language processing and computer vision. Currently my research has focused on machine learning and natural language processing.

Education

I received my BS in Mathematics and BS in Chemistry at the University of Wisconsin with Honors in 2011 and my MS in Computer Science in May 2014 from the University of Illinois under the supervision of Dan Roth.

Key Publications

(EMNLP 2020) A Bilingual Generative Transformer for Semantic Sentence Embedding (pdf)

This was joint work with Graham Neubig (CMU), and Taylor Berg-Kirpatrick (UCSD).
Code To train and evaluate models from the paper.

(ACL 2019) Simple and Effective Paraphrastic Similarity from Parallel Translations (pdf)

This was joint work with Kevin Gimpel (TTIC), Graham Neubig (CMU), and Taylor Berg-Kirpatrick (UCSD).
Code To train and evaluate models from the paper.

(ACL 2019) Beyond BLEU: Training Neural Machine Translation with Semantic Similarity (pdf)

This was joint work with Taylor Berg-Kirpatrick (UCSD), Kevin Gimpel (TTIC), and Graham Neubig (CMU).
Code To train and evaluate models from the paper.

(ICLR 2019) No Training Required: Exploring Random Encoders for Sentence Classification (pdf)

This was joint work with Douwe Kiela (FAIR).
Code To train and evaluate models from the paper.

(NAACL 2018) Adversarial Example Generation with Syntactically Controlled Paraphrase Networks (pdf)

This was joint work with Mohit Iyyer (AI2), Kevin Gimpel (TTIC), and Luke Zettlemoyer (AI2).
Code To train and evaluate models from the paper.

(ACL 2018) Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations (pdf)

This was joint work with Kevin Gimpel from Toyota Technological Institute at Chicago.
Para-nmt-50m 50M+ back-translated paraphrases from the Czeng1.6 corpus.
Para-nmt-5m-processed 5.3M+ filtered and tokenized back-translated paraphrases from the Czeng1.6 corpus.
Code to train and evaluate models from the paper (run setup.sh script). Note that the demo zip can take a while to download, it can be downloaded more quickly here: para-nmt-50m-demo.zip.

(EMNLP 2017) Learning Paraphrastic Sentence Embeddings from Back-Translated Bitext (pdf)

This was joint work with Kevin Gimpel from Toyota Technological Institute at Chicago and Jonathan Mallinson from the University of Edinburgh.
Data Back-translated data used in the paper. It consists of 150k examples with a beam-size of up to 50 for 7 different corpus/language pairs.
Code to train and evaluate models from the paper (run setup.sh script). Note that\ demo zip can take a while to download, it can be downloaded more quickly here: emnlp2017-demo.zip.

(ACL 2017) Revisiting Recurrent Networks for Paraphrastic Sentence Embeddings (pdf)

This was joint work with Kevin Gimpel from Toyota Technological Institute at Chicago.
Code To train and evaluate models from the paper. This also includes the oracle pre-trained model for the GRAN model (run setup.sh script).

(EMNLP 2016) Charagram: Embedding Words and Sentences via Character n-grams (pdf)

This was joint work with Mohit Bansal, Kevin Gimpel, and Karen Livescu at Toyota Technological Institute at Chicago.
Code To train and evaluate models from the paper. This also includes some pre-trained models for charagram (70.6 on SimLex-999) and charagram-phrase (run setup.sh script).

(ICLR 2016, oral) Towards Universal Paraphrastic Sentence Embeddings (pdf)

This was joint work with Mohit Bansal, Kevin Gimpel, and Karen Livescu at Toyota Technological Institute at Chicago.
Code To train and evaluate models from the paper.
Paragram-Phrase XXL Note that this includes only around 50,000 embeddings. To expand the vocabulary, combine these with the Paragram-SL999 embeddings below.

(TACL 2015) From Paraphrase Database to Compositional Model and Back (pdf|bib).

This was joint work with Dan Roth at the University of Illinois Urbana-Champaign and Mohit Bansal, Kevin Gimpel, and Karen Livescu at Toyota Technological Institute at Chicago.
Datasets from the paper:
- Annotated-PPDB
- ML-Paraphrase
Code To use scripts, download the full version below that includes the training data and models.
New! Paragram-WS353 300 dimensional Paragram embeddings tuned on WordSim353 dataset. 1.7 GB download.
New! Paragram-SL999 300 dimensional Paragram embeddings tuned on SimLex999 dataset. 1.7 GB download.
Paragram Embeddings in a text file. Each line is a 25 dimensional embedding.
Full Version (~600 MB and includes training data and trained RNN models in addition to code and datasets)

Some Old Projects

Generalization of Strongly Convex Online Learning Algorithms Download : This paper presents a discussion and fills in some of the blanks I had when reading Sham Kakade's paper. The main idea from the project, is that in batch learning we are interested in bounding generalization error with some probability. In online, we are interested in bounding the regret or the difference in the total loss we have incurred on all examples we have seen versus the total loss of the optimal function in our class. This paper relates the two for a particular class of online learning algorithms and it is also then able to characterize the convergence rate of these algorithms with high probability, not just the expected rate. It is a good application of learning theory, an interest of mine.
Tiered Clustering Model for Lexical Entailment Download : In this project, I investigated clustering contexts for improving lexical entailment. I tried two different approaches. The first was a tiered clustering model. This is a nonparametric Bayes algorithm and so we do not need to specify in advance the number of clusters (which is nice - although we do still have hyperparameters that can affect the number of clusters). It was also hierarchical in the sense that a word could belong to one of two topics - a background topic or a forefront topic (i.e. one of the clusters). The other clustering approach was a simple greedy set covering approach. The paper suggests that clustering can help lexical entailment, especially tiered clustering, but we must be careful of how the latent senses are combined to form the new representation of the word.
Learning and Inference in Entity Relation Identification Download : This paper investigates three approaches to discovering entities and relations in text. The first just uses a collection of local classifiers, the second uses a collection of local classifiers with integer linear programming inference, and the third uses inference based training. The third approach differs from the second as we are using the result of our ILP inference to update the weight vectors. This approach can be seen to be equivalent to Collin's structured perceptron (discussed in paper). The results are somewhat surprising as there is not too much difference in performance, but the IBT does have the best results in general. A lot of extra work for some small improvement - but at least the outputs of our predictions are more coherent.
Two Dimensional Non-causal HMM for Texture Classification Download : This paper utilizes a special HMM that can model dependencies, not only those from the left to the right, but also from up and down as well as diagonal directions. We implemented and applied this HMM to texture classification (i.e. is the image of bark, water, granite, etc.).
Constrained Conditional Model Java Programming Language and Library : Some code I hope to release at some point when I have some more time. Basically allows for one to simply create a simple CCM that can do multiclass learning, multiclass with ILP inference, or structured learning with structured perceptron or structured SVM with ILP inference. Simple and lightweight - useful for quick implementations or learning how these models work.

Teaching

Teaching Assistant for CS 546: Machine Learning in Natural Language Processing (Spring 2013)
Teaching Assistant for CS 125: Introduction to Computer Science for Majors (Fall 2012)
1. rated as Excellent (by ICES scoring, submitted by students)
Teaching Assistant for CS 421: Programming Languages and Compilers (Summer 2012)
Teaching Assistant for CS 125: Introduction to Computer Science for Majors (Spring 2011)
1. rated as Outstanding (Top 10%) (by ICES scoring, submitted by students)
Teaching Assistant for CS 125: Introduction to Computer Science for Majors (Fall 2011)
1. rated as Excellent (by ICES scoring, submitted by students)
Instructor for CS 199: Honors Projects for CS 125 (Fall 2011)

Selected Awards

State Farm PhD Fellowship (2014)

Outstanding Computer Science Teaching Assistant Award (2012)

Undergraduate Chemistry Research Award (2008)