Background: Modeling and predicting discrete sequences is the central problem to many natural language processing tasks. Despite the distinct evaluation metrics for different tasks, the standard training algorithm for language generation has been maximum likelihood estimation (MLE). However, the MLE algorithm has two obvious weaknesses: (1) the MLE training ignores the information of the task specific metric; (2) MLE can suffer from the exposure bias, which refers to the phenomenon that the model is never exposed to its own failures during training. The recently proposed reward augmented maximum likelihood (RAML) tackles these problems by constructing a task metric dependent target distribution, and training the model to match this task-specific target instead of the empirical data distribution.
Abstract: In this talk, we study the credit assignment problem in reward augmented maximum likelihood (RAML), and establish a theoretical equivalence between the token-level counterpart of RAML and the entropy regularized reinforcement learning. Inspired by the connection, we propose two sequence prediction algorithms, one extending RAML with fine-grained credit assignment and the other improving Actor-Critic with a systematic entropy regularization. On two benchmark datasets, we show that the proposed algorithms outperform RAML and Actor-Critic respectively.
The AI Seminar is generously sponsored by Apple