CMU Artificial Intelligence Seminar Series sponsored by


Back to Seminar Schedule

Tuesday, Nov 02, 2021

Time: 12:00 - 01:00 PM ET
Recording of this Online Seminar on Youtube

Jeremy Cohen -- Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability

Relevant Paper(s):

Abstract: Neural networks are trained using optimization algorithms. While we sometimes understand how these algorithms behave in restricted settings (e.g. on quadratic or convex functions), very little is known about the dynamics of these optimization algorithms on real neural objective functions. In this paper, we take a close look at the simplest optimization algorithm - full-batch gradient descent with a fixed step size- and find that its behavior on neural networks is both (1) surprisingly consistent across different architectures and tasks, and (2) surprisingly different from that envisioned in the "conventional wisdom." In particular, we empirically demonstrate that during gradient descent training of neural networks, the maximum Hessian eigenvalue (the "sharpness") always rises all the way to the largest stable value, which is 2/(step size), and then hovers just above that numerical value for the remainder of training, in a regime we term the "Edge of Stability." (Click here for 1m 17s animation.) At the Edge of Stability, the sharpness is still "trying" to increase further - and that's what happens if you drop the step size - but is somehow being actively restrained from doing so, by the implicit dynamics of the optimization algorithm. Our findings have several implications for the theory of neural network optimization. First, whereas the conventional wisdom in optimization says that the sharpness ought to determine the step size, our paper shows that in the topsy-turvy world of deep learning, the reality is precisely the opposite: the step size wholly determines the sharpness. Second, our findings imply that convergence analyses based on L-smoothness, or on ensuring monotone descent, do not apply to neural network training.