NeuLab Presentations at ICLR 2019

NeuLab members have four main conference paper presentations, and two invited talk at ICLR 2019! Come check them out if you’re in New Orleans for the conference.

Main Conference Papers

Multilingual Neural Machine Translation With Soft Decoupled Encoding

  • Authors: Xinyi Wang, Hieu Pham, Philip Arthur, Graham Neubig.
  • Time: Thursday May 9, 11:00–13:00. Great Hall BC #31.

Multilingual Neural Machine Translation With Soft Decoupled Encoding

Multilingual training of neural machine translation (NMT) systems has led to impressive accuracy improvements on low-resource languages. However, there are still significant challenges in efficiently learning word representations in the face of paucity of data. In this paper, we propose Soft Decoupled Encoding (SDE), a multilingual lexicon encoding framework specifically designed to share lexical-level information intelligently without requiring heuristic preprocessing such as pre-segmenting the data. SDE represents a word by its spelling through a character encoding, and its semantic meaning through a latent embedding space shared by all languages. Experiments on a standard dataset of four low-resource languages show consistent improvements over strong multilingual NMT baselines, with gains of up to 2 BLEU on one of the tested languages, achieving the new state-of-the-art on all four language pairs. Code is available here.

Learning to Represent Edits

  • Authors: Pengcheng Yin, Graham Neubig, Miltiadis Allamanis, Marc Brockschmidt, Alexander L. Gaunt.
  • Time: Thursday May 9, 11:00-13:00. Great Hall BC #51.

Learning to Represent Edits

We introduce the problem of learning distributed representations of edits. By combining a “neural editor” with an “edit encoder”, our models learn to represent the salient information of an edit and can be used to apply edits to new inputs. We experiment on natural language and source code edit data. Our evaluation yields promising results that suggest that our neural network models learn to capture the structure and semantics of edits. We hope that this interesting task and data source will inspire other researchers to work further on this problem. Code is available here.

Lagging Inference Networks and Posterior Collapse in Variational Autoencoders

  • Authors: Junxian He, Daniel Spokoyny, Graham Neubig, Taylor Berg-Kirkpatrick.
  • Time: Thursday May 9, 16:30–18:30. Great Hall BC #2.

Lagging Inference Networks and Posterior Collapse in Variational Autoencoders

The variational autoencoder (VAE) is a popular combination of deep latent variable model and accompanying variational learning technique. By using a neural inference network to approximate the model’s posterior on latent variables, VAEs efficiently parameterize a lower bound on marginal data likelihood that can be optimized directly via gradient methods. In practice, however, VAE training often results in a degenerate local optimum known as “posterior collapse” where the model learns to ignore the latent variable and the approximate posterior mimics the prior. In this paper, we investigate posterior collapse from the perspective of training dynamics. We find that during the initial stages of training the inference network fails to approximate the model’s true posterior, which is a moving target. As a result, the model is encouraged to ignore the latent encoding and posterior collapse occurs. Based on this observation, we propose an extremely simple modification to VAE training to reduce inference lag: depending on the model’s current mutual information between latent variable and observation, we aggressively optimize the inference network before performing each model update. Despite introducing neither new model components nor significant complexity over basic VAE, our approach is able to avoid the problem of collapse that has plagued a large amount of previous work. Empirically, our approach outperforms strong autoregressive baselines on text and image benchmarks in terms of held-out likelihood, and is competitive with more complex techniques for avoiding collapse while being substantially faster. Code is available here.

MAE: Mutual Posterior-Divergence Regularization for Variational AutoEncoders

  • Authors: Xuezhe Ma, Chunting Zhou, Eduard Hovy.
  • Time: Thursday May 9, 16:30–18:30. Great Hall BC #71.

MAE: Mutual Posterior-Divergence Regularization for Variational AutoEncoders

Variational Autoencoder (VAE), a simple and effective deep generative model, has led to a number of impressive empirical successes and spawned many advanced variants and theoretical investigations. However, recent studies demonstrate that, when equipped with expressive generative distributions (aka. decoders), VAE suffers from learning uninformative latent representations with the observation called KL Varnishing, in which case VAE collapses into an unconditional generative model. In this work, we introduce mutual posterior-divergence regularization, a novel regularization that is able to control the geometry of the latent space to accomplish meaningful representation learning, while achieving comparable or superior capability of density estimation. Experiments on three image benchmark datasets demonstrate that, when equipped with powerful decoders, our model performs well both on density estimation and representation learning.


Deep Generative Models for Highly Structured Data

Learning about Language with Normalizing Flows

  • Speaker: Graham Neubig.
  • Time: Monday May 6, 15:15-18:30. Room R02.

Human language is complex and highly structured, with the unique syntax of each language defining this structure. While analyzing this structure and using it to train better NLP models is of inherent interest to linguists and NLP practitioners, for most languages in the world there is a paucity of labeled data. In this talk, I will discuss unsupervised and semi-supervised methods for learning about this structure and the correspondence between languages, specifically take advantage of a powerful tool called normalizing flows to build generative models over complex underlying structures. First, I will give a brief overview of normalizing flows, using an example from our recent work that uses these techniques to learn bilingual word embeddings. Then, I will demonstrate how these can be applied to un- or semi-supervised learning of linguistic structure with structured priors for part-of-speech tagging or dependency parsing.

Deep RL Meets Structured Prediction

What can Statistical Machine Translation teach Neural Machine Translation about Structured Prediction?

  • Speaker: Graham Neubig.
  • Time: Monday May 6, 09:45-13:00. Room R02.

In 2016, I co-authored a wide-sweeping survey on training techniques for statistical machine translation (link). This survey was promptly, and perhaps appropriately, forgotten in the tsunami of enthusiasm for new advances in neural machine translation (NMT). Now that the dust has settled after five intense years of research into NMT and training methods therefore, perhaps it is time to revisit our old knowledge and see what it can teach us with respect to training techniques for NMT. In this talk, I will broadly overview several years of research into sequence-level training objectives for NMT, then point out a several areas where our understanding of training techniques for NMT still lags significantly behind what we knew for more traditional approaches to SMT.