An interesting phenomenon in modern machine learning research is simple procedures often solve complex problems very well in practice. For example, even though deep neural networks are highly non-convex, simple stochastic gradient descent often shows superior performance. On the other hand, from theoretical point of view, these complex problems are often computationally hard in the worst case. These competing facts show that often there are unknown structures in the data that make simple heuristics succeed in practice.
My thesis work aims to fill the gap between theory and practice by providing new insights on how structures in data explain the empirical phenomena in modern machine learning problems. First, we will revisit an existing analytical framework based on geometry and examine whether it can explain the success of gradient descent for non -convex problems in machine learning. Next, we will study some specific models in machine learning. We will show gradient descent can learn a convolutional filter with L2 loss under general structural assumptions on the input distributions. Further, we will show that if the input distribution is Gaussian, then gradient descent can optimize a one-hidden-layer convolutional neural network in which both the convolutional weights and the output wights are parameter to be learned. Finally we will list some concrete problems on understanding over-parameterization and homogeneity, which we would like to study in the future.
Barnabas Poczos (Co-Chair)
Aarti Singh (Co-Chair)
Michael I. Jordan (University of California, Berkeley)