The main goal of this thesis is to show that modern advancements in areas like Reproducing Kernel Hilbert Space (RKHS) theory, randomized algorithms, active learning and convex optimization can be used to make critical progress in understanding and applying classical statistical methods like classification, regression and hypothesis testing.
We first present strong connections between two seemingly different problems - active classification using one dimensional thresholds and stochastic black-box optimization. We use this unified view to develop new lower bounds for stochastic optimization using tools from active learning and new algorithms for active learning of thresholds using ideas from optimization. In a different work, we derive fast primal-dual algorithms for margin- based classification in RKHSs, with the aim of simultaneously solving the primal problem (finding a perfect classifier in an RKHS if there exists one) and the dual problem (producing a certificate of infeasibility if there is no such perfect classifier). We also develop new convex analytical tools like versions of Gordan’s and Hoffman’s theorems with margins using insights from convex geometry.
We then move from classification to regression, the simplest setting corresponding to solving a linear system of equations (or finding the ordinary least squares solution). We study the Randomized Kaczmarz and Random-ized Coordinate Descent algorithms for solving both consistent and inconsistent systems and derive a Kaczmarz-style algorithm for Kernel Ridge Regression which never explicitly stores or inverts the full kernel matrix, instead calculating random rows as required, addressing a major issue encountered with scaling up kernelized methods.
After classification and regression, we finally turn to non-parametric hypothesis testing, in low-sample or high-dimensional regimes. In one line of work, we demonstrate that Stein shrinkage of cross-covariance ma- trices can improve the power of kernel independence testing since it shrinks the null and alternate distributions differently. In a parallel work, we clearly demonstrate the decreasing power of both kernel-based and distance- based two-sample tests and independence tests with increasing dimensionality (and the role played by the median heuristic), challenging existing folklore that they “work well” in high dimensions.
Aarti Singh (Co-chair)
Larry Wasserman (Co-chair)
Michael Jordan (University of California, Berkeley)
Arther Gretton (University College, London)
diane [atsymbol] cs.cmu.edu