CMU CMU Artificial Intelligence Seminar Series sponsored by Fortive Fortive


Back to Seminar Schedule

Tuesday, Feb 09, 2021

Time: 12:00 - 01:00 PM ET
Recording of this Online Seminar on Youtube

Michael Auli -- Self-supervised Learning of Speech Representations with wav2vec

Relevant Paper(s):

Abstract: Self-supervised learning has been a key driver of progress in natural language processing and increasingly in computer vision. In this talk I will give an overview of the wav2vec line of work which explores algorithms to learn good representations of speech audio solely from unlabeled data. The resulting models can be fine-tuned for a specific task using labeled data and enable speech recognition models with just 10 minutes of labeled speech audio by leveraging a large amount of unlabeled speech. Our latest work, wav2vec 2.0 learns a vocabulary of speech units obtained by quantizing the latent representation of the speech signal and by solving a contrastive task defined over the quantization. We also explored multilingual pre-training and recently released a model trained on 53 different languages.

Bio: Michael Auli is a scientist at Facebook AI Research in Menlo Park, California. During his PhD, he worked on natural language processing and parsing at the University of Edinburgh where he was advised by Adam Lopez and Philipp Koehn. While at Microsoft Research, he did some of the early work on neural machine translation and neural dialogue models. After this, he led the team which developed convolutional sequence to sequence models that were the first models to outperform recurrent neural networks for neural machine translation. Currently, Michael works on semi-supervised and self-supervised learning applied to natural language processing and speech recognition. He led the teams which ranked first in several tasks of the WMT news translation task in 2018 and 2019.