Title:

Improved Approximation for Connected Facility Location(work in progress)

The connected facility location problem is the following. Given an undirected graph $G=(V,E)$ with edge weights $c: E \to R_{>0}$, a subset $D\subseteq V$ of demands, and a number $M\geq 1$. A solution is a pair $(F,T)$, where $F\subseteq V$ is a subset of nodes (facilities) and $T$ is a Steiner tree with terminals $F$. The cost of this solution is equal to $M\cdot c(T) + \sum_{d \in D}\ell(d,F)$, where $c(T)$ is the cost of $T$ and $\ell(d,F)$ denotes the shortest distance of $d$ to a facility in $F$. The problem is to find a solution of minimal cost.

In this talk we present a new method to analyze the performance of the (current best) random sampling algorithm of Gupta et al. [STOC’03] with varying selection probability. This analysis shows that random sampling with a certain selection probability yields an expected 3.05-approximation algorithm for connected facility location.

Joint work with: Friedrich Eisenbrand and Thomas Rothvoss

Optimization under Probes and Partial Information

In this talk we consider the class of optimization problems where we are given some prior knowledge about a set of variables, and the ability to refine a subset of variables subject to budget constraints. At the end of the refinement process, we optimize a function based on all the available information. The goal is to design the best strategy of refinement

(respecting the budget), such that the expected value of the final solution conditioned on the strategy is optimized over all strategies. These types of problems arise from a variety of scenarios in networks, databases and learning theory.

In this talk we focus on several such problems and explore the different possible trade-offs, and in particular, the existence of non-adaptive strategies that are near optimal with respect to the best adaptive strategies. Interestingly, the study of these problems leads us to an unified solution of several budgeted learning problems.

Title:

Inapproximability results for learning with noise

One of the central algorithmic tasks in learning theory is to learn an unknown classification rule (hypothesis) from a set of training examples so that future examples can be classified. The hypothesis output by the learner is required to belong to a certain “concept class” which is assumed to contain a good classification rule for the data. Under the promise that the concept class contains a perfect classifier that is correct on all training examples, the learning problem amounts to finding a hypothesis from the class that is consistent with the training data. For some concept classes, such as the class of monomials, parity functions, or halfspaces, this problem is solvable in polynomial time.

But what if there is no such consistent hypothesis? This would often be the case in practice where one does not have accurate background knowledge on the correct hypothesis that explains the data. Also, even if the original data is perfectly explained by a hypothesis from the assumed concept class, the actual training data could be noisy. In such a case, a natural goal, called “agnostic learning,” is to output a hypothesis from the class that correctly classifies as many examples as possible.

A celebrated result due to (Hastad, 97) showed that for the class of parity functions, even a tiny amount of noise makes learning a parity function with any non-trivial agreement NP-hard. Similar results were recently obtained for the class of monomials and halfspaces. In this talk, I will discuss these hardness results for agnostic learning, focusing mostly on the following tight hardness result for learning halfspaces: for arbitrary constants $\epsilon,\delta > 0$, given training data from the hypercube a fraction $(1-\epsilon)$ of which can be explained by a halfspace, it is NP-hard to find a halfspace that correctly labels a fraction $(1/2+\delta)$ of the examples.

Based on joint work with Prasad Raghavendra.

Title:

Data aggregation in sensor networks: an algorithmic perspective

A sensor network consists of sensing devices which may exchange data through wireless communication; sensor networks are highly energy constrained since they are usually battery operated and the reduction of energy consumption is the key challenge in a sensor network.

Energy consumption is mainly due to communication; for this reason many techniques have been proposed to reduce overall data communication, possibly exploiting processing capabilities available at each node.

Data aggregation is one such technique; it consists of aggregating redundant or correlated data in order to reduce the overall size of sent data, thus decreasing the network traffic and energy consumption. We survey some algorithmic issues motivated by data aggregation in sensor networks.

Title:

TBA

Approximation Algorithms for Stochastic Inventory Control Models

We consider stochastic control inventory models in which the goal is to coordinate a sequence of orders of a single commodity, aiming to supply stochastic demands over a discrete finite time horizon with minimum expected overall holding and backlogging costs. We shall survey a number of recent papers in this area, focusing on recent work that relies on a new amortization technique to analyze a cost balancing approach to derive approximation algorithms for a number of models with constant performance guarantees. The work surveyed includes joint results with Retsef Levi, Martin Pal, Robin Roundy, and Van Anh Truong.

Approximating degree bounded spanning trees

Given a graph G, we consider the problem of finding the minimum spanning tree (MST) that minimizes the maximum degree in the tree. This problem generalizes the Hamiltonian path and is NP-hard. For the unweighted case, Furer and Raghavachari gave an algorithm with an additive error of one; i.e. it outputs a tree of degree $D_{OPT}+1$, where $D_{OPT}$ is the max degree of the optimal tree.

For the weighted case, Fischer gave a local-search algorithm that outputs a tree of max degree at most $2D_{OPT} + \log n$.

I will talk about some recent work on this problem. In joint work with Kamalika Chaudhuri, Satish Rao and Samantha Riesenfeld, we gave the first constant-factor approximation for this problem. For a weighted graph G, our algorithm outputs a tree of max degree $2D_{OPT}+o(D_{OPT})$. We use the push-relabel framework that has previously been used for fast, exact algorithms for the maximum flow problem by Goldberg and Tarjan. I’ll try to show this technique for designing approximation algorithms.

Ravi and Singh recently gave an algorithm that outputs a tree with degree $D_{OPT} + k$, when the graph has at most k distinct edge weights. More recently Goemans improved the approximability of the problem to $D_{OPT}+2$, almost matching the lower bound.

A Semidefinite Programming Based Approach for Partitioning Graphs

A fundamental problem in computing is to partition a graph into two “roughly” equal parts so as to minimize the number of edges going crossing the partition. This problem arises naturally in a variety of scenarios such as VLSI layout, clustering and parallel computation.

A result of Leighton and Rao (‘88) provided a O(log n) factor approximation for this problem. Their approach was based on a Linear Programming (LP) relaxation of the problem and it was observed that it cannot give a factor better than \Omega(log n).

Soon, a Semidefinite Programming (SDP) based approach was suggested and it was conjectured that it would lead to a constant factor approximation algorithm. In ‘04, a result due to Arora, Rao and Vazirani made progress towards this conjecture by showing that this SDP based approach gives a O(\sqrt{log n}) factor algorithm, a significant improvement over the LP based approach. On the other hand, recently in a paper (with Nikhil Devanur, Subhash Khot and Rishi Saket), we disproved this conjecture by establishing that this SDP based approach cannot lead to an approximation algorithm which achieves a factor better than \Omega(loglog n).

This talk will survey the SDP based approach for this graph partitioning problem, and overview the recent positive and negative results.