I'm interested in applied and theoretical machine learning problems with meaningful real-world impact. Broadly, my PhD research focuses on non-iid learning and clustering (e.g. spectral clustering, image segmentation, record linkage). My more recent explorations include imitation learning and self-driving vehicles.
Specifically, my thesis introduces the first cross-validation resampling techniques which estimate and correct for the effects of statistical dependency across clusters. The independent and identically distributed (iid) assumption is fundamental to the guarantees of most machine learning algorithms. Yet, in practice, it is frequently violated. Data from active learning, time series, and clustering or record-linkage results all break this assumption to varying degrees.
I'm fortunate to apply my work to the counter-human-trafficking project at CMU (think NBC Dateline meets Terminator). We scrape hundreds of millions of escort ads from online, extract features, and use machine learning to find cases of human trafficking. The tools are used by over 100 real-world law enforcement to make actual arrests and rescue real victims (largely through the spin-off Marinus Analytics). My PhD advisor for this work is Artur Dubrawski, and my committee members are Geoff Gordon, Kris Kitani and Beka Steorts. I am also an NSF Fellow.