Adaptive Sharpness-Aware Minimization with a Polyak-type Step size: A Theory-Grounded Scheduler
Sharpness-Aware Minimization (SAM), a widely adopted optimizer for training machine learning models, has been shown to improve generalization and deliver strong empirical performance, but its effectiveness is sensitive to the choice of learning rate.
SAM seeks to find flat minima in deep learning by minimizing the worst-case loss in its neighborhood in the parameter space[2]. However, the choice of learning rate for SAM is typically selected through extensive hyperparameter tuning or predefined schedulers[1]. Researchers have proposed Polyak schedulers tailored to SAM-style updates, yielding novel adaptive algorithms in both deterministic and stochastic settings. These schedulers have been shown to achieve performance comparable to or better than carefully tuned SAM baselines, while reducing the need for learning-rate tuning[1]. Theoretical analysis has also revealed that SAM dynamics can lead to convergence instability, with the saddle point becoming an attractor under certain conditions[2]. Nevertheless, techniques such as momentum and batch-size can mitigate this instability and achieve high generalization performance.
research-paper
Background sources we checked (1)
- arxiv.org ↗ Sharpness-Aware Minimization (SAM) has established itself as a powerful and widely adopted optimizer for training machine learning models. By explicitly minimizing the sharpness of the loss landscape, SAM often improves generalization while delivering strong empirical performance…