In this talk, I will present recent progress on understanding deep neural networks by analyzing the trajectory of the gradient descent algorithm. Using this analysis technique, we are able to explain: 1) why gradient descent finds a global minimum of the training loss even though the objective function is highly non-convex, and 2) why a neural network can generalize even the number of parameters in the neural network is more than the number of training data.
Based on joint work with Sanjeev Arora, Wei Hu, Jason D. Lee, Haochuan Li, Zhiyuan Li, Barnabas Poczos, Aarti Singh, Liwei Wang, Ruosong Wang, Xiyu Zhai
Simon Shaolei Du is a Ph.D. student in the Machine Learning Department at the School of Computer Science, Carnegie Mellon University, advised by Professor Aarti Singh and Professor Barnabás Póczos. His research interests broadly include topics in theoretical machine learning and statistics, such as deep learning, matrix factorization, convex/non-convex optimization, transfer learning, reinforcement learning, non-parametric statistics, and robust statistics. In 2015, he obtained his B.S. in Engineering Math & Statistics and B.S. in Electrical Engineering & Computer Science from the University of California, Berkeley. He has also spent time working at research labs of Microsoft and Facebook.
The AI Seminar is generously sponsored by Apple