## Tuesday, September 13, 2016. 12:00PM. NSH 3305.

# Stephan Mandt - Variational Inference: From Artificial Temperatures to Stochastic Gradients

Bayesian modeling is a popular approach to solving machine learning problems. In this talk, we will first review variational inference, where we map Bayesian inference to an optimization problem. This optimization problem is non-convex, meaning that there are many local optima that correspond to poor fits of the data. We first show that by introducing a "local temperature" to every data point and applying the machinery of variational inference, we can avoid some of these poor optima, suppress the effects of outliers, and ultimately find more meaningful patterns. In the second part of the talk, we will then present a Bayesian view on Stochastic Gradient Descent (SGD). When operated with a constant, non-decreasing learning rates, SGD first marches towards the optimum of the objective and then samples from a stationary distribution that is centered around the optimum. As such, SGD resembles Markov Chain Monte Carlo (MCMC) algorithms which, after a burn-in period, draw samples from a Bayesian posterior. Drawing on the tools of variational inference, we investigate and formalize this connection. Our analysis reveals criteria that allow us to use SGD as an approximate scalable MCMC algorithm that can compete with more complicated state-of-the-art Bayesian approaches.

**Speaker bio:** Stephan Mandt is a Research Scientist at Disney Research Pittsburgh where he leads the statistical machine learning group.
Previously, he was a postdoctoral researcher with David Blei at Columbia University, where he worked on scalable approximate Bayesian inference algorithms. Trained as a statistical physicist, he held a previous postdoctoral fellowship at Princeton University and holds a Ph.D. from the University of Cologne as a fellow of the German National Merit Foundation.

Personal website: www.stephanmandt.com