Standard analyses of DP-SGD rely on the unrealistic bounded-gradient assumption. We propose an algorithm with high-probability convergence guarantees that avoids this restriction while achieving an optimal privacy-utility tradeoff [13]. Complementary SDE analyses [19] show that adaptive methods such as DP-SignSGD and DP-Adam perform better under strong privacy constraints and are less sensitive to learning-rate tuning.
Works on distributed RL remains limited, and such problems are inherently nonsmooth due to operations like maximization. In [15], we develop an algorithm with provable convergence for nonsmooth convex objectives that simultaneously handles both the objective and constraints. The method significantly outperforms several baselines in distributed training of humanoid agents.
Compression is a practical approach to reducing communication overhead. Biased compressors (e.g., Top-K) are particularly effective in practice, but introduce bias that can degrade performance. We design provably convergent algorithms for smooth nonconvex, convex, and strongly convex objectives that achieve optimal asymptotic complexity while reducing communication in both centralized [9] and decentralized [10] settings. In [5], we further develop adaptive biased compressors that adjust compression strength during training.
Standard convergence guarantees for local methods do not capture the benefits of local updates, despite their strong empirical performance. In [6], we identified a class of functions for which local methods provably achieve faster convergence.
Transmitting full Hessians in federated settings is impractical. We address this bottleneck by designing second-order methods that employ Hessian and gradient compression to reduce communication overhead [1-4]. Crucially, these methods retain the hallmark properties of Newton's method, achieving local superlinear convergence independent of the condition number.
These theoretical results advance the understanding of optimization algorithms in federated and distributed settings by providing stronger guarantees under realistic conditions, improving communication and computation efficiency, and explaining the robustness and effectiveness of modern methods in practice.
Transactions on Machine Learning Research (TMLR 2023)
PDF
arXiv
TMLR
Abstract