🚨 New paper alert! 🚨 Our latest work has been accepted to NeurIPS! We demonstrate how adapting the NGN stepsize with momentum enhances optimization both theoretically and practically. What’s inside? 📚 Theory: Convergence in convex and non-convex settings. 🔧 Cleaner analysis: We drop restrictive assumptions (no bounded gradients, no interpolation), improving over prior work on Stochastic Polyak stepsize. 🚀 Practice: NGN + momentum and NGN + Adam = way more robust to learning rate choices, from ResNets to large-scale LMs. Special credit to my amazing coauthors: Niccolò Ajroldi, Antonio Orvieto, Aurelien Lucchi.