Remove stale preconditioner config names from `__init__.py` by runame · Pull Request #256 · facebookresearch/optimizers

runame · 2026-04-11T19:41:05Z

Summary

shampoo_types.py renamed AmortizedPreconditionerConfig → BaseShampooPreconditionerConfig and ShampooPreconditionerConfig → ClassicShampooPreconditionerConfig (cb150a9), but
distributed_shampoo/__init__.py still imported and re-exported the old names, so import distributed_shampoo fails at HEAD with ImportError.
Replace the stale names in the imports and __all__ entries.

Test plan

python -c "import distributed_shampoo" succeeds
distributed_shampoo.BaseShampooPreconditionerConfig and distributed_shampoo.ClassicShampooPreconditionerConfig are accessible
Every entry in __all__ resolves to an actual module attribute
Inheritance chain intact (Classic→Base, RootInv→Classic, Eigendecomposed→Classic, EigenvalueCorrected→Base)

`AmortizedPreconditionerConfig` was renamed to `BaseShampooPreconditionerConfig` and `ShampooPreconditionerConfig` to `ClassicShampooPreconditionerConfig` in `shampoo_types.py`, but `__init__.py` still imported and re-exported the old names. Replace them with the new names in the imports and `__all__`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

wz337

Thanks for fixing this!

meta-codesync · 2026-04-17T00:37:55Z

@hjmshi has imported this pull request. If you are a Meta employee, you can view this in D101275252.

Summary: The four grafting CLI overrides in `.github/workflows/examples.yaml` use the form `'optimizer.grafting_config={_target_:...,...}'`. With OmegaConf's default merge semantics, this *merges* the new mapping into the inherited `grafting_config` from `distributed_shampoo/examples/configs/optimizer/shampoo.yaml`, which currently defines: ```yaml grafting_config: _target_: distributed_shampoo.AdamPreconditionerConfig beta2: 0.999 epsilon: 1e-8 ``` So an override with `_target_: AdaGradPreconditionerConfig` ends up as `{_target_: AdaGrad, beta2: 0.999, epsilon: 1e-08}` and Hydra instantiation fails with: ``` TypeError: AdaGradPreconditionerConfig.__init__() got an unexpected keyword argument 'beta2' ``` `SGDPreconditionerConfig` has the same latent bug (it would inherit both `beta2` and `epsilon`), but execution stops at the AdaGrad line. Adam and RMSprop happen to work because their `_target_` accepts `beta2`. This regression was introduced in commit `405622d` ("Consolidate CIFAR-10 examples in Shampoo with hydra configs"), which switched the workflow from argparse-based scripts to Hydra overrides without accounting for dict-merge semantics. ## Fix Replace each `optimizer.grafting_config={...}` override with the standard Hydra delete-then-add idiom: \`\`\` '~optimizer.grafting_config' '+optimizer.grafting_config={...}' \`\`\` This forces the inherited mapping to be discarded and the new mapping inserted whole, so only the explicitly-listed keys are passed to the target's constructor. Verified locally with \`hydra.compose\`: - old override → \`{_target_: AdaGrad, beta2: 0.999, epsilon: 1e-08}\` (broken) - new override → \`{_target_: AdaGrad, epsilon: 1e-08}\` (correct) Applied to all six grafting-override invocations in the file (CPU single-GPU loop, GPU single-GPU, DDP CPU, DDP GPU) for consistency. Pull Request resolved: #257 Test Plan: - [x] CI examples job passes on this branch - [x] AdaGrad, Adam, RMSprop, and SGD grafting cases all run to completion ## Notes Stacked on top of #256 (\`fix/init-stale-rename-imports\`). That PR's import fix unblocked the \`examples\` job enough to surface this latent merge bug. The branch is rebased on \`fix/init-stale-rename-imports\`, so until #256 merges this diff will appear to include #256 commit. Merge order: #256 first, then this. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed By: wz337 Differential Revision: D103081868 Pulled By: hjmshi fbshipit-source-id: 069f887a3cc701316d6c486b6a278474155481e4

Summary: Resolves the remaining `pre-commit` and `type-check` job failures on `main` after PRs #256 (init imports) and #257 (examples workflow). Three commits, each separately reviewable: ### Commit 1: lint fixes (ruff E402 + E731) and copyright header normalization - **E402 (~92 violations across 12 files)**: every affected file had a duplicate triple-quoted copyright block at the top followed by the *real* module docstring (or, in `gpa/`, a `#`-comment copyright). Python only treats the first triple-quoted string as a docstring, so the duplicate counted as a non-import statement and triggered E402 for every subsequent import. Drop the duplicate so the real module docstring is the first statement. - **E731 (2 violations)** in `_shampoo_fully_shard_lossless_distributor.py:69` and `_shampoo_hybrid_shard_lossless_distributor.py:75`: rewrite `should_assign_param_idx = lambda i: ...` as a `def`. - **Copyright convention**: 79/79 untouched files use a single triple-quoted block. The 6 gpa files I touched (plus `gpa/gpa_adamw.py`, which had both styles) are normalized to that convention by dropping their `#`-comment copyright in favor of the triple-quoted form. ### Commit 2: ruff-format pass The pinned `ruff-format` (v0.8.0 in `.pre-commit-config.yaml`) considers 14 unrelated files mis-formatted (purely whitespace around `assert ..., (...)` wrapping). Applying the formatter so the hook passes on the first run instead of failing after auto-modifying files. ### Commit 3: mypy fixes (33 errors → 0) All 27 errors reported by CI plus 6 local-only errors that newer mypy/torch stubs surface: - **gpa/gpa_adamw.py** (10 errors): - `step()`: add `overload` pair to match `torch.optim.Optimizer` - `step()`: annotate per-group buffer lists as `list[Tensor]` - `step()`: skip empty parameter groups via `continue`; rename group-local `first_param` to `group_first_param` to avoid mypy union with the `Optional[Parameter]` from the train-mode check loop - **gpa/tests/gpa_test_utils.py:42**: annotate `devices` as `tuple[device, ...]` - **gpa/gpu_tests/gpa_adamw_numerics_test.py:31**: ignore missing `parameterized` stubs (no public stub package) - **gpa/examples/cifar10_example.py:115**: ignore `len()` on `Dataset[Any]` - **distributed_shampoo/distributed_shampoo.py**: - lines 669-671: drop redundant re-annotations on the `AdaGradPreconditionerConfig` branch (no-redef) - line 1632: move stray `# type: ignore` onto the `_pre_load_train_modes` assignment line so it actually applies - **distributed_shampoo/distributor/_shampoo_hybrid_shard_lossless_distributor.py:160**: replace `filter(lambda, ...)` with a generator expression to dodge a confusing `TypeGuard` overload mypy picks for `filter()` - **distributed_shampoo/distributor/shampoo_{ddp,hsdp,hybrid_shard}_distributor.py**: ignore `attr-defined` for the private `DeviceMesh._get_all_submeshes` — local-only, surfaces with current torch stubs - **distributed_shampoo/distributor/shampoo_distributor.py:228**: ignore narrowing of `map(partial(tuple), ...)` assignment — local-only - **distributed_shampoo/distributor/gpu_tests/shampoo_checkpoint_test.py:376,411** and **examples/parallelism.py:177,206**: ignore `arg-type` for `partial[FSDPModule]` / `FSDPModule` passed to a `Module` parameter (FSDPModule mixes in but mypy can't see it) - **distributed_shampoo/examples/utils.py:161**: annotate `sampler` - **distributed_shampoo/examples/tests/utils_test.py:228,240**: replace `assertIsNotNone` with `assert ... is not None` so mypy narrows `_fmt` before `assertIn` - **distributed_shampoo/utils/load_balancing_utils.py:57**: extend existing `# type: ignore[misc]` to also cover `call-overload` from `max(float, floating[Any])` ## Local verification ``` $ ruff check . All checks passed! $ ruff format --check . 93 files already formatted $ mypy . Success: no issues found in 93 source files ``` Pull Request resolved: #258 Test Plan: - [x] CI `pre-commit` job passes - [x] CI `type-check` job passes - [x] Other CI jobs (`tests`, `gpu-tests`, `examples`) unchanged ## Notes Stacked on top of #256 (`fix/init-stale-rename-imports`) and #257 (`fix/examples-beta2-config`). Until those merge, the diff for this PR will appear to include their commits. Merge order: #256 → #257 → this. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed By: wz337 Differential Revision: D103082000 Pulled By: hjmshi fbshipit-source-id: d4bf91517bc7426019a3fde42420e4408b2bbc59

meta-codesync · 2026-05-05T20:00:09Z

@hjmshi merged this pull request in 75f77f6.

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 11, 2026

wz337 approved these changes Apr 15, 2026

View reviewed changes

This was referenced Apr 25, 2026

Fix examples workflow grafting config overrides #257

Closed

Fix pre-commit (ruff) and mypy errors #258

Closed

meta-codesync Bot closed this in 75f77f6 May 5, 2026

facebook-github-tools Bot added the Merged label May 5, 2026

runame deleted the fix/init-stale-rename-imports branch May 7, 2026 08:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove stale preconditioner config names from `init.py`#256

Remove stale preconditioner config names from `init.py`#256
runame wants to merge 1 commit into
facebookresearch:mainfrom
runame:fix/init-stale-rename-imports

runame commented Apr 11, 2026

Uh oh!

wz337 left a comment

Uh oh!

meta-codesync Bot commented Apr 17, 2026

Uh oh!

meta-codesync Bot commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

runame commented Apr 11, 2026

Summary

Test plan

Uh oh!

wz337 left a comment

Choose a reason for hiding this comment

Uh oh!

meta-codesync Bot commented Apr 17, 2026

Uh oh!

meta-codesync Bot commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants