Computer Science Thesis Proposal

  • Remote Access - Zoom
  • Virtual Presentation - ET
  • Ph.D. Student
  • Computer Science Department
  • Carnegie Mellon University
Thesis Proposals

Machine Learning: Metrics and Embeddings

In this thesis proposal, we propose and analyze new theories of clustering, one of the most fundamental tasks in machine learning. We use methods drawing from multiple disciplines, including metric embeddings, spectral algorithms, and group representation theory.

  1. We propose a metric that adapts to the shape of data, and show how to quickly compute it. These metrics may be useful for improving k-means clustering methods.
  2. We build a spectral partition method with provable theoretical guarantees. This may lead to more theoretically principled spectral clustering methods, as existing methods do not have any such guarantees. Spectral clustering is one of the most popular methods of clustering.
  3. We classify all Manhattan distance kernels. This is a Manhattan distance analog of one of the fundamental results in machine learning kernel theory. Kernel methods are one of the oldest and most established methods of clustering data.

Each of these contributions answers natural questions in machine learning theory. For each of these contributions, we develop multidisciplinary tools. For ongoing and future work, we propose to apply these tools to a wider range of natural machine learning problems.

Thesis Committee:
Gary Miller (Chair)
Danny Sleator
Anupam Gupta
Satish Rao (University of California, Berkeley)

Zoom Participation. See announcement.

For More Information, Please Contact: