L4: Practical loss-based stepsize adaptation for deep learning.
We propose a stepsize adaptation scheme for stochastic gradient descent. Itoperates directly with the loss function and rescales the gradient in order tomake fixed predicted progress on the loss. We demonstrate its capabilities bystrongly improving the performance of Adam and Momentum optimizers. Theenhanced optimizers with default hyperparameters consistently outperform theirconstant stepsize counterparts, even the best ones, without a measurableincrease in computational cost. The performance is validated on multiplearchitectures including ResNets and the Differential Neural Computer. Aprototype implementation as a TensorFlow optimizer is released.
Stay in the loop.
Subscribe to our newsletter for a weekly update on the latest podcast, news, events, and jobs postings.