Statistics & Machine Learning Thesis Defense

  • Ph.D. Student
  • Joint Ph.D. Program in Statistics & Machine Learning
  • Carnegie Mellon University
Thesis Orals

Topics in Prediction

Modern science regularly requires to provide forecast of possible outcomes based on past experience. The field of statistics address this objective by developing methods that learn prediction functions from data. This thesis considers different aspects of such methods, mostly focused on the supervised learning setting in which the goal is to predict an output prediction of a variable Y from a set of input variables X's.

Level-set prediction is a non-parametric method for classification and regression. It utilizes p(x | y) and p(x,y) in contrast to dominant methods utilizing p(y | x). The method provides conformal prediction sets that contains the right output with 1-α probability. The main advantage of this new approach is that it's cautious. It outputs the null set — meaning "I don't know" — when the input does not resemble previously seen examples.

Convolutional Neural Networks is used in many state-of-the-art prediction methods in the field of deep learning. Due to the operation of the convolution, the usage is limited to grid-structured input, such as 2D images and temporal sequences. Graph Convolutional Neural Networks generalizes the idea of convolution to graph structured input. It uses  spatial methods to select the graph nearest neighbors in order to define an analogous convolution for graph.

There are situations in which the same data is used both for the selection and the inference. That can make the reported uncertainties deceptively optimistic: confidence intervals that ignore selection generally have less than their nominal coverage probability. Confidence intervals for the selected parameters construct valid confidence intervals post selection for the selection of the maximum k shift parameters out of m. These intervals control the probability that one or more of the selected parameters do not cover—the "simultaneous over the selected'' (SoS) error rate.

For a given model it is often desired to interpret which variables affect the prediction. We demonstrate the usefulness of the input gradients of the variables with respect to the prediction as a generic method to obtain model interpertability. The method conceptually generalizes the way linear regression studies the coefficients. We demonstrate its usefulness by interpreting complex neural networks on a sentiment analysis dataset.

Thesis Committee:
Larry Wasserman (Co-Advisor)
Alessandro Rinaldo (Co-Advisor)
Jing Lei
Ryan Tibshirani
Rebecca Nugent
Lucas Mentch (University of Pittsburgh)

Additional Thesis Information

For More Information, Please Contact: