Language Technologies Thesis Defense

  • Gates Hillman Centers
  • Reddy Conference Room 4405
  • Ph.D. Student
  • Language Technologies Institute
  • Carnegie Mellon University
Thesis Orals

Large-Scale Machine Learning over Graphs

Graphs are ubiquitous in a broad range of statistical modeling and machine learning applications. They are powerful mathematical concepts in representing not only structured data, but also structured computation procedures (e.g. neural network architectures). Despite their ubiquity, efficient learning over heterogeneous and/or complex graph structures remains a grand challenge, beyond much existing theory and algorithms. In this thesis, we address this challenge in several complementary aspects:

In the first part we focus on learning across heterogeneous graphs. We propose a novel framework to fuse multiple heterogeneous graphs into a single homogeneous graph, on which the learning task can be formulated in a principled manner. We then develop a method that imposes analogical structures among the heterogeneous nodes in the graphs for improved generalization, which also theoretically unifies several representative models. In both cases, we develop scalable approximation algorithms to ensure linear scalability over the size of the input graphs.

Then we focus on graph induction problems, where a latent graph structure must be inferred directly from the data. We investigate the task in the graph spectral domain (of eigenvectors and eigenvalues), and propose an efficient non-parametric approach to recover the graph spectrum that best characterizes the underlying diffusion process over the data.

Finally, we focus on the optimization of neural network architectures, represented as acyclic graphs. We present a hierarchical representation scheme for neural network topologies, where smaller graph motifs are used as building blocks to form the larger ones. We then relax the discrete architectures as continuous variables to enable efficient gradient-based optimization, leading to orders of magnitude speedup over several state-of-the-art non-differentiable techniques. The automatically learned architectures achieve highly competitive performance for both image classification and language modeling, outperforming a large number of architectures manually designed by human experts.

Thesis Committee:
Yiming Yang, (Chair)
Jaime Carbonell
Zico Kolter
Karen Simonyan, (DeepMind)

Copy of Thesis Document

For More Information, Please Contact: