Computer Science Thesis Proposal
- Gates Hillman Centers
- Reddy Conference Room 4405
- BRANDON AMOS
- Ph.D. Student
- Computer Science Department
- Carnegie Mellon University
Differentiable Optimization-Based Inference for Machine Learning
This thesis presents machine learning models, paradigms, and primitive operations that involve using optimization as part of the inference procedure. We show why these techniques provide useful modeling tools that subsume many well-known standard operations. We then discuss and propose solutions to challenges that arise when doing inference and learning in these models.
The first portion describes the input-convex neural network (ICNN) architecture that helps make inference and learning in deep energy-based models and structured prediction more tractable. These are scalar-valued (potentially deep) neural networks with constraints on the network parameters such that the output of the network is a convex function of (some of) the inputs. The networks allow for efficient inference via optimization over some inputs to the network given others, and can be applied to settings including structured prediction, data imputation, and reinforcement learning. We lay the basic groundwork for these models, proposing methods for inference, optimization and learning, and analyze their representational power. We show that many existing neural network architectures can be made input-convex with a minor modification, and develop specialized optimization algorithms tailored to this setting.
The next portion describes OptNet, a network architecture that integrates optimization problems as individual layers in larger end-to-end trainable deep networks. These layers encode constraints and complex dependencies between the hidden states that traditional convolutional and fully-connected layers often cannot capture. We show to exactly differentiate through these layers and have developed a highly efficient solver that exploits fast GPU-based operations within a primal-dual interior point method, and which provides backpropagation gradients with virtually no additional cost on top of the solve.
We propose to continue studying these optimization-based models in two research directions and one engineering direction:
- We will study the use of optimization and deep energy-based methods in combinatorial and discrete output spaces.
- We will study the use of control as a differentiable policy class that can be used with reinforcement learning.
- We will make a cvxpy PyTorch layer to enable quick prototyping of the optimization-based layers this thesis studies.
J. Zico Kolter (Chair)
Nando de Freitas (DeepMind)
Vladlen Koltun (Intel Labs)