Machine Learning / Duolingo Seminar

  • Newell-Simon 4305 and Zoom
  • In Person and Virtual ET
  • NATHAN KALLUS
  • Assistant Professor
  • School of Operations Research and Information Engineering
  • Cornell Tech / Cornell University
Seminars

Learning and Optimization: Separate or Integrate?

Predictive features (aka "context") reduce the uncertainty in optimization under uncertainty, but leveraging this requires we learn a potentially complex relationship. We can potentially use off-the-shelf ML methods to learn it, but the training process would ignore the downstream optimization task into which we plug in the model. Alternatively, we can train the predictive model in an end-to-end fashion to directly optimize the downstream costs of the decision policy it would induce. In this talk I will tackle the question, which is better? Should we separate or integrate the learning and optimization tasks?

In the first part of this talk I will focus on contextual linear optimization, where the cost function is bilinear in the decision and uncertain variables so, by linearity of expectation, the only relevant aspect of the predictive relationship is the conditional expectation (aka regression function). I will show that the naive separated approach actually achieves regret convergence rates that are significantly faster than any end-to-end method that directly optimizes downstream decision performance. I show this by leveraging the fact that specific problem instances do not have arbitrarily bad near-dual-degeneracy and developing appropriate upper and lower bounds in view of this. This is overall positive for practice: predictive models are easy and fast to train using existing ML tools, simple to interpret and reuse as a prediction, and, as shown, lead to decisions that perform very well.

In the second part of this talk I will focus on general (nonlinear) contextual stochastic optimization problems, where we must consider the whole conditional probability model. As this object can be much higher dimensional than any decision policy, it is here better to integrate the tasks and directly learn a policy. I will adapt random forests to this task by searching over tree splits to directly optimize downstream decisions, rather than prediction accuracy. This is a seemingly computationally intractable problem that I will solve by developing approximate splitting criteria that utilize optimization perturbation analysis to eschew burdensome re-optimization. I prove that the approximations are consistent and the method is asymptotically optimal, and I empirically validate its superior performance.

This talk is based on the following papers:

In Person and Zoom Participation. See announcement.

For More Information, Please Contact: 
Keywords: