Learning using local membership queries With Vitaly Feldman, Varun Kanade. COLT 2013.
We introduce a new model of membership query (MQ) learning, where the learning algorithm is restricted to query points that are close to random examples drawn from the underlying distribution. The learning model is intermediate between the PAC model and the PAC + MQ model.

Testing Lipschitz functions on hypergrid domains With Madhav Jha, Marco Molinaro and Sofya Raskhodnikova. RANDOM 2012.
An integer valued function over {0,1}^n is Lipschitz if changing one of the inputs by 1 changes the output by at most 1. In other words, Lipschitz functions are not very sensitive to small changes in the input. Our main result is an efficient tester for the Lipschitz property of functions f:[n]^d -> Z. A property tester is given an oracle access to a function f and a proximity parameter epsilon, and it has to distinguish, with high probability, functions that have the property from functions that differ on at least an fraction of values from every function with the property. The Lipschitz property was first studied by Jha and Raskhodnikova (FOCS'11) who motivated it by applications to data privacy and program verification. They presented efficient testers for the Lipschitz property of functions on the domains {0,1}^d and [n]. Our tester for functions on the more general domain [n]^d runs in time O(d^{1.5} n logn) for constant epsilon.

Limitations of local filters of Lipschitz and Monotone functions With Madhav Jha, Marco Molinaro and Sofya Raskhodnikova. RANDOM 2012.
We study local filters for the Lipschitz property of functions f:{0,1}^d -> R. A local filter with additive error a is a randomized algorithm that is given black-box access to a function f and a query point x in the domain of f. Its output is a value F(x), such that (i) the reconstructed function F(x) is Lipschitz and (ii) if the input function f satisfies the property then for every point x in the domain, with high constant probability the reconstructed value F(x) differs from f(x) by at most a. Local filters were introduced by Saks and Seshadhri (SICOMP 2010). The relaxed definition we study is due to Bhattacharyya et. al (RANDOM 2010), except that we further relax it by allowing additive error. Local filters for Lipschitz functions have applications to areas such as data privacy. We show that every local filter for Lipschitz functions runs in time exponential in the dimension d in the worst case. This holds even for filters with significant additive error. Prior lower bounds applied only to more restrictive notions of filters, e.g., nonadaptive filters that are required to specify all their lookups in advance, before obtaining values of f on any points.

Additive approximation for near-perfect Phylogeny construction With Avrim Blum, Jamie Morgenstern and Or Sheffet. APPROX 2012.
We study the problem of constructing phylogenetic trees for a given set of species. The problem is formulated as that of finding a minimum Steiner tree on n points over the Boolean hypercube of dimension d. It is known that an optimal tree can be found in linear time if the given dataset has a perfect phylogeny, i.e. cost of the optimal phylogeny is exactly d. Moreover, if the data has a near-perfect phylogeny, i.e. the cost of the optimal Steiner tree is d+q, it is known that an exact solution can be found in running time which is polynomial in the number of species and d, yet exponential in q. In this work, we give a polynomial-time algorithm (in both d and q) that finds a phylogenetic tree of cost d+O(q^2). This provides the best guarantees known - namely, a (1+o(1))-approximation - for the case log(d) <= q <= \sqrt{d}, broadening the range of settings for which near-optimal solutions can be efficiently found. We also discuss the motivation and reasoning for studying such additive approximations.

Improved spectral-norm bounds for clustering With Or Sheffet. APPROX 2012.
Aiming to unify known results about clustering mixtures of distributions under separation conditions, Kumar and Kannan[2010] introduced a deterministic condition for clustering datasets. They showed that this single deterministic condition encompasses many previously studied clustering assumptions. More specifically, their proximity condition requires that in the target k-clustering, the projection of a point x onto the line joining its cluster center \mu and some other center \mu', is a large additive factor closer to \mu than to \mu'. This additive factor can be roughly described as k times the spectral norm of the matrix representing the differences between the given (known) dataset and the means of the (unknown) target clustering. Clearly, the proximity condition implies center separation -- the distance between any two centers must be as large as the above mentioned bound. In this paper we improve upon the work of Kumar and Kannan along several axes. First, we weaken the center separation bound by a factor of \sqrt{k}, and secondly we weaken the proximity condition by a factor of k. Using these weaker bounds we still achieve the same guarantees when all points satisfy the proximity condition. We also achieve better guarantees when only (1-\epsilon)-fraction of the points satisfy the weaker proximity condition.

Center-based clustering under perturbation stability With Avrim Blum and Or Sheffet. Information Processing Letters 2012.
Clustering under most popular objective functions is NP-hard, even to approximate well, and so unlikely to be efficiently solvable in the worst case. Bilu and Linial (2009) suggested an approach aimed at bypassing this computational barrier by using properties of instances one might hope to hold in practice. In particular, they argue that instances in practice should be stable to small perturbations in the metric space and give an efficient algorithm for clustering instances of the Max-Cut problem that are stable to perturbations of size O(n^{1/2}). In addition, they conjecture that instances stable to as little as O(1) perturbations should be solvable in polynomial time. In this paper we prove that this conjecture is true for any center-based clustering objective (such as k-median, k-means, and k-center). Specifically, we show we can efficiently find the optimal clustering assuming only stability to factor-3 perturbations of the underlying metric in spaces without Steiner points, and stability to factor 2+\sqrt{3} perturbations for general metrics. In particular, we show for such instances that the popular Single-Linkage algorithm combined with dynamic programming will find the optimal clustering. We also present NP-hardness results under a weaker but related condition.

On Nash-equilibria of approximation-stable games. With Nina Balcan, Avrim Blum, Or Sheffet and Santosh Vempala. SAGT 2010.
In this paper, we define the notion of games that are approximation-stable, meaning that all $\epsilon$-approximate equilibria are contained inside a small ball of radius $\delta$ around a true equilibrium, and investigate a number of their properties. Many natural small games such as matching pennies and rock-paper-scissors are indeed approximation stable. We show furthermore that there exist 2-player n-by-n approximation-stable games in which the Nash equilibrium and all approximate equilibria have support $\Omega(log n)$. In addition, we also show that all $(\epsilon,\delta)$ approximation-stable games must have an $\epsilon$-approximate equilibrium of support $O(\frac{\delta^{2-o(1)}}{\epsilon^2}log n)$, yielding a faster algorithm for computing approximate Nash-equilibria, for games which are approximation-stable.

Stability yields a PTAS for k-median and k-means clustering. With Avrim Blum and Or Sheffet. FOCS 2010.
Ostrovsky et al. [ORSS'06] showed that if a clustering instance satisfies a stability condition, namely that the optimal (k-1)-means clustering is more expensive than the optimal k-means clustering by a factor of $max{100, 1/\alpha^2}$, then one can achieve a $(1+f(\alpha))$-approximation to the k-means optimal in time polynomial in n and k by using a variant of Lloyd's algorithm. In this paper we improve on their result by showing that if the (k-1)-means optimal is more expensive than the k-means optimal by a factor $1+\alpha$ for some constant $\alpha>0$, then we can obtain a PTAS for the k-means problem. Our technique can also be used to obtain a PTAS under the assumption considered by [Balcan et al.'09] that all $(1+\alpha)$ approximations are $\delta$-close to a desired target clustering, when all target clusters have size greater than $2\delta n$.

Improved guarantees for agnostic learning of disjunctions. With Avrim Blum and Or Sheffet. COLT 2010.
Given a distribution D over {0,1}^n X {-1,+1}, define OPT to be the error achieved by the best disjunction with respect to D. The problem is to come up with a polynomial time algorithm which uses random samples from D to construct a prediction rule whose error is atmost (c.OPT + \epsilon), for any \epsilon > 0. In this paper we show how to achieve c = O(n^{1/3 + \alpha}), for any \alpha > 0. This improves on the previous best bound of O(\sqrt(n)) by [Peleg'07].

Supervised clustering. With Reza Bosagh Zadeh. NIPS 2010.
This paper studies the clustering framework proposed in [Balcan, Blum'08]. In this framework we assume that a given set of m points belongs to k target clusters and each cluster is defined by some concept in a concept class C. The goal of the clustering algorithm is to interact with the user and figure out the target clustering. This paper gives a simple query-efficient algorithm to cluster any concept class in this model. We also study two natural generalizations of the original model.

Online stochastic optimization in the large: Application to kidney exchange. With Tuomas Sandholm. IJCAI 2009.
This paper studies trajectory-based online algorithms for solving large stochastic programs. We apply these algorithms to match patients in large kidney exchanges.

Decision trees for entity identification: Approximation algorithms and hardness results. With Venkatesan Chakaravarthy, Vinayaka Pandit, Sambuddha Roy and Mukesh Mohania. PODS 2007.
Consider the following problem: Given an input table containing information about a set of entities over a fixed set of attributes and a probability distribution over the entities, construct a decision tree that identifies each entity unambiguously by testing the attribute values such that the average number of tests is minimized. Building on prior work ([Adler, Heeringa'05]) we study the general version of the problem involving arbitrary input tables and arbitrary probability distribution over the set of entities. We consider a natural greedy algorithm and prove an approximation guarantee of O(r_K log(N)), where N is the number of entities, K is the maximum number of distinct values of an attribute, and r_K is a suitably defined Ramsey number. In addition, our analysis indicates a possible way of resolving a Ramsey theoretic conjecture by Erdos.

Image modeling using tree structured conditional random fields. With Aakanksha Gagrani and Balaraman Ravindran. IJCAI 2007.