CMU Theory Lunch: Abhradeep Guha Thakurta, Diffentially Private Empirical Risk Minimization and High-dimensional Regression

Diffentially Private Empirical Risk Minimization and High-dimensional Regression

October 10, 2012

In the recent past there have been several high-profile privacy breaches on machine learning based systems which satisfied various ad-hoc notions of privacy. Some of the examples being: attack on Amazon recommendation system by Calandrino et al. in 2011, attack on Facebook advertisement system by Korolova in 2011 etc. With these breaches in place, an obvious question that arises is “How can we design learning algorithms with rigorous privacy guarantees?”

In this talk, I will focus on designing convex empirical risk minimization (ERM) algorithms (a special class of learning algorithms) with differential privacy guarantees. In the recent past, differential privacy has emerged as one of the most commonly used rigorous notions of privacy.

My talk will be logically segregated into two parts:

Part a) Private ERM in low-dimension: In this part, I will discuss about various approaches for differentially private ERM when the number of data samples (n) exceeds the dimensionality of the problem (p). I will discuss two of our new approaches towards private ERM in the low-dimensional setting: i) Improved objective perturbation algorithm (which is an improvement over the initial algorithm proposed by Chaudhuri et al. in 2008 and 2011) and ii) Online convex programming (OCP) based algorithm. To conclude this part, I will do a comparative study of these two approaches along with another approach existent in the literature (i.e., output perturbation by Chaudhuri et al. , 2011).

Part b) Sparse regression in high-dimensions: In this section, I will discuss about the first private ERM algorithms in high-dimensions (i.e., where the number of data samples (n) is much smaller than the dimensionality of the problem (p). I will restrict myself to the sparse regression setting, where we assume that the underlying parameter vector \theta^* defining the regression model is sparse, i.e., has very few non-zero entries. I will discuss four different algorithms for differentially private sparse regression, two of which will be specific to linear regression. I will compare these four algorithms in terms of their sample complexity and running time.

Joint work with Prateek Jain [MSR, India], Daniel Kifer [Penn State], Pravesh Kothari [University of Texas, Austin] and Adam Smith [Penn State].