NGN-M: Enhancing Robustness to the Learning Rate Hyperparameter

🚨 New paper alert! 🚨
Our latest work has been accepted to NeurIPS! We demonstrate how adapting the NGN stepsize with momentum enhances optimization both theoretically and practically.
What’s inside?
📚 Theory: Convergence in convex and non-convex settings.
🔧 Cleaner analysis: We drop restrictive assumptions (no bounded gradients, no interpolation), improving over prior work on Stochastic Polyak stepsize.
🚀 Practice: NGN + momentum and NGN + Adam = way more robust to learning rate choices, from ResNets to large-scale LMs.
Special credit to my amazing coauthors: Niccolò Ajroldi, Antonio Orvieto, Aurelien Lucchi.

comments powered by Disqus