Fantastic Pretraining Optimizers and Where to Find Them

★

☆

SHARE

Overview

Paper Summary

Conflicts of Interest

Identified Weaknesses

Rating Explanation

Good to know

Topic Hierarchy

File Information

Paper Summary

Paperzilla title

Muon and Soap Reign Supreme...But Only for Small Language Models

This paper benchmarks 11 optimizers for large language model pretraining and finds that while some like Muon and Soap do offer a speedup over AdamW, it is smaller (up to 1.4x) than previously claimed and diminishes as model size increases. Furthermore, they find that optimal hyperparameters vary significantly between optimizers, making comparisons using shared hyperparameters unfair, and early checkpoints can be misleading as optimizer rankings can shift during training.

Possible Conflicts of Interest

The authors acknowledge support from Google, which has a vested interest in efficient large language model training, but this seems appropriately disclosed and does not obviously bias the research.

Identified Weaknesses

Limited model sizes tested

The largest model tested is 1.2B parameters, leaving open the question of how these optimizers perform on truly massive models that dominate current research and applications (7B+ parameters). The paper does extrapolate results suggesting the speedup disappears at larger sizes, but empirical validation is missing.

Focus on pretraining

The study solely evaluates optimizers on pretraining, not fine-tuning or downstream tasks. While pretraining is a major cost, ultimate performance on specific tasks matters more.

Rating Explanation

This is a strong study with rigorous methodology addressing a relevant problem. The hyperparameter tuning, scaling analysis, and identification of misleading evaluation practices are valuable. The limited model size is a notable weakness preventing a 5, but the findings are important for current-scale models and motivate important further research at larger scales.

Good to know

This is our free standard analysis. Paperzilla Pro fact-checks every citation, researches author backgrounds and funding sources, and uses advanced AI reasoning for more thorough insights.

Explore Pro →