Approximation Algorithms for Path-planning and Clustering Problems on Graphs Shuchi Chawla Thesis Committee: Avrim Blum (chair) Anupam Gupta R. Ravi Moses Charikar (Princeton Univ) Abstract: In this thesis proposal, we study two classes of combinatorial optimization problems on graphs, motivated by classical problems in Machine Learning and Scheduling, and give approximation algorithms for them. The first of these includes ``path-planning'' problems on graphs, that model the machine learning problem of Robot Navigation, as well as various scheduling objectives. Suppose that a robot needs to deliver packages to several locations, but may not be able to deliver all of them because it has a limited supply of battery power. One way of formulating this problem is to maximize the total value of packages that can be delivered in a certain fixed amount of time. This is the classic {\em Orienteering} problem. In an alternate formulation that better models the uncertainty in the robot's environment, the value of every package decreases with time, but there is no limit on the length of the path. Therefore the robot must visit many nodes as early as possible. This is an example of the {\em Discounted-Reward TSP} problem. We give the first constant factor approximations for these problems and several of their extensions. The second class of problems that we consider are ``qualitative clustering'' problems motivated by natural language processing, and other learning problems. Given a similarity measure between pairs of vertices, we introduce the objective of Correlation Clustering, in which the goal is to cluster similar documents together, and dissimilar documents in different clusters. We give constant factor approximations for various Correlation Clustering objectives for the special case when the similarity measure only classifies pairs as similar or dissimilar, but does not specify the degree of similarity or dissimilarity. As future work, we plan to consider several generalizations of Orienteering. For example, one extension is that different packages have different time windows within which they must be delivered. In another extension, the underlying metric on locations changes with time, or new locations arrive over time. We also plan to consider stochastic variants of the problem in which the graph is a Markov Decision Process, and edges signify actions. Here the robot must minimize the expected time taken to perform a given task. For the correlation clustering problem, we plan to design an iterative algorithm that, given an initial good clustering on a graph, can recompute a good clustering when a few changes (such as additions and deletions of nodes) are made to the graph, in a short amount of time. We also plan to incorporate side constraints such as bounded cluster sizes and bounded number of clusters in our algorithms. This proposal is based on joint work with Nikhil Bansal and Avrim Blum (FOCS 2002), Avrim Blum, David Karger, Adam Meyerson, Maria Minkoff and Terran Lane (FOCS 2003) and Nikhil Bansal, Avrim Blum and Adam Meyerson (to appear in STOC 2004).