Momentum based Stochastic Gradient Descent (SGD) algorithms are the workhorse for deep learning, and central premise for adding momentum based techniques is to accelerate SGD. However, it is not clear if such methods really accelerate vanilla SGD (with batch-size=1) or require the batch-size to be large to provide acceleration. In this talk, we will present a surprising result which shows that such methods cannot improve over SGD in standard stochastic approximation setting. However, by modifying the algorithm, we can show that one can indeed accelerate SGD but this requires very careful parametrization of the problem.
Based on joint works with Rahul Kidambi, Praneeth Netrapalli, Sham Kakade, and Aaron Sidford.
Prateek Jain has been a researcher at Microsoft Research Lab, India since January 2010. He earned his PhD in Computer Science from UT Austin in December 2009, under the guidance of Prof. Inderjit S. Dhillon, and his BTech in Computer Science from IIT Kanpur in 2004. Prateek's primary research interests are in machine learning, non-convex optimization and linear algebra.