Tuesday, March 29, 2016. 12:00PM. NSH 3305.

Back to Lunch Schedule

Shayan Doroudi - Importance Sampling for Fair Policy Selection

Importance sampling is a statistical technique that is used in batch reinforcement learning settings to give unbiased estimates of how well a policy will perform given data from another policy. In addition to evaluating policies, importance sampling has also been used for policy selection and policy search. In this talk, I show that importance sampling is unfair when used to choose policies; that is, in some cases it chooses the worse of two choices more than half of the time. I present several (possibly counterintuitive) examples of where this unfairness may be of practical concern. I then show that, in theory, we can make fair decisions with importance sampling by restricting attention to a particular class of policies. Using insights gathered from the theory, I present a practical policy search algorithm that uses importance sampling with a novel form of regularization.