Remove stale preconditioner config names from __init__.py#256
Closed
runame wants to merge 1 commit into
Closed
Conversation
`AmortizedPreconditionerConfig` was renamed to `BaseShampooPreconditionerConfig` and `ShampooPreconditionerConfig` to `ClassicShampooPreconditionerConfig` in `shampoo_types.py`, but `__init__.py` still imported and re-exported the old names. Replace them with the new names in the imports and `__all__`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
wz337
approved these changes
Apr 15, 2026
|
@hjmshi has imported this pull request. If you are a Meta employee, you can view this in D101275252. |
This was referenced Apr 25, 2026
meta-codesync Bot
pushed a commit
that referenced
this pull request
May 5, 2026
Summary:
The four grafting CLI overrides in `.github/workflows/examples.yaml` use the form
`'optimizer.grafting_config={_target_:...,...}'`. With OmegaConf's default merge
semantics, this *merges* the new mapping into the inherited
`grafting_config` from `distributed_shampoo/examples/configs/optimizer/shampoo.yaml`,
which currently defines:
```yaml
grafting_config:
_target_: distributed_shampoo.AdamPreconditionerConfig
beta2: 0.999
epsilon: 1e-8
```
So an override with `_target_: AdaGradPreconditionerConfig` ends up as
`{_target_: AdaGrad, beta2: 0.999, epsilon: 1e-08}` and Hydra
instantiation fails with:
```
TypeError: AdaGradPreconditionerConfig.__init__() got an unexpected keyword argument 'beta2'
```
`SGDPreconditionerConfig` has the same latent bug (it would inherit both
`beta2` and `epsilon`), but execution stops at the AdaGrad line. Adam
and RMSprop happen to work because their `_target_` accepts `beta2`.
This regression was introduced in commit `405622d` ("Consolidate CIFAR-10
examples in Shampoo with hydra configs"), which switched the workflow
from argparse-based scripts to Hydra overrides without accounting for
dict-merge semantics.
## Fix
Replace each `optimizer.grafting_config={...}` override with the
standard Hydra delete-then-add idiom:
\`\`\`
'~optimizer.grafting_config' '+optimizer.grafting_config={...}'
\`\`\`
This forces the inherited mapping to be discarded and the new mapping
inserted whole, so only the explicitly-listed keys are passed to the
target's constructor.
Verified locally with \`hydra.compose\`:
- old override → \`{_target_: AdaGrad, beta2: 0.999, epsilon: 1e-08}\` (broken)
- new override → \`{_target_: AdaGrad, epsilon: 1e-08}\` (correct)
Applied to all six grafting-override invocations in the file (CPU
single-GPU loop, GPU single-GPU, DDP CPU, DDP GPU) for consistency.
Pull Request resolved: #257
Test Plan:
- [x] CI examples job passes on this branch
- [x] AdaGrad, Adam, RMSprop, and SGD grafting cases all run to completion
## Notes
Stacked on top of #256 (\`fix/init-stale-rename-imports\`). That PR's
import fix unblocked the \`examples\` job enough to surface this latent
merge bug. The branch is rebased on \`fix/init-stale-rename-imports\`,
so until #256 merges this diff will appear to include #256 commit.
Merge order: #256 first, then this.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed By: wz337
Differential Revision: D103081868
Pulled By: hjmshi
fbshipit-source-id: 069f887a3cc701316d6c486b6a278474155481e4
meta-codesync Bot
pushed a commit
that referenced
this pull request
May 5, 2026
Summary: Resolves the remaining `pre-commit` and `type-check` job failures on `main` after PRs #256 (init imports) and #257 (examples workflow). Three commits, each separately reviewable: ### Commit 1: lint fixes (ruff E402 + E731) and copyright header normalization - **E402 (~92 violations across 12 files)**: every affected file had a duplicate triple-quoted copyright block at the top followed by the *real* module docstring (or, in `gpa/`, a `#`-comment copyright). Python only treats the first triple-quoted string as a docstring, so the duplicate counted as a non-import statement and triggered E402 for every subsequent import. Drop the duplicate so the real module docstring is the first statement. - **E731 (2 violations)** in `_shampoo_fully_shard_lossless_distributor.py:69` and `_shampoo_hybrid_shard_lossless_distributor.py:75`: rewrite `should_assign_param_idx = lambda i: ...` as a `def`. - **Copyright convention**: 79/79 untouched files use a single triple-quoted block. The 6 gpa files I touched (plus `gpa/gpa_adamw.py`, which had both styles) are normalized to that convention by dropping their `#`-comment copyright in favor of the triple-quoted form. ### Commit 2: ruff-format pass The pinned `ruff-format` (v0.8.0 in `.pre-commit-config.yaml`) considers 14 unrelated files mis-formatted (purely whitespace around `assert ..., (...)` wrapping). Applying the formatter so the hook passes on the first run instead of failing after auto-modifying files. ### Commit 3: mypy fixes (33 errors → 0) All 27 errors reported by CI plus 6 local-only errors that newer mypy/torch stubs surface: - **gpa/gpa_adamw.py** (10 errors): - `step()`: add `overload` pair to match `torch.optim.Optimizer` - `step()`: annotate per-group buffer lists as `list[Tensor]` - `step()`: skip empty parameter groups via `continue`; rename group-local `first_param` to `group_first_param` to avoid mypy union with the `Optional[Parameter]` from the train-mode check loop - **gpa/tests/gpa_test_utils.py:42**: annotate `devices` as `tuple[device, ...]` - **gpa/gpu_tests/gpa_adamw_numerics_test.py:31**: ignore missing `parameterized` stubs (no public stub package) - **gpa/examples/cifar10_example.py:115**: ignore `len()` on `Dataset[Any]` - **distributed_shampoo/distributed_shampoo.py**: - lines 669-671: drop redundant re-annotations on the `AdaGradPreconditionerConfig` branch (no-redef) - line 1632: move stray `# type: ignore` onto the `_pre_load_train_modes` assignment line so it actually applies - **distributed_shampoo/distributor/_shampoo_hybrid_shard_lossless_distributor.py:160**: replace `filter(lambda, ...)` with a generator expression to dodge a confusing `TypeGuard` overload mypy picks for `filter()` - **distributed_shampoo/distributor/shampoo_{ddp,hsdp,hybrid_shard}_distributor.py**: ignore `attr-defined` for the private `DeviceMesh._get_all_submeshes` — local-only, surfaces with current torch stubs - **distributed_shampoo/distributor/shampoo_distributor.py:228**: ignore narrowing of `map(partial(tuple), ...)` assignment — local-only - **distributed_shampoo/distributor/gpu_tests/shampoo_checkpoint_test.py:376,411** and **examples/parallelism.py:177,206**: ignore `arg-type` for `partial[FSDPModule]` / `FSDPModule` passed to a `Module` parameter (FSDPModule mixes in but mypy can't see it) - **distributed_shampoo/examples/utils.py:161**: annotate `sampler` - **distributed_shampoo/examples/tests/utils_test.py:228,240**: replace `assertIsNotNone` with `assert ... is not None` so mypy narrows `_fmt` before `assertIn` - **distributed_shampoo/utils/load_balancing_utils.py:57**: extend existing `# type: ignore[misc]` to also cover `call-overload` from `max(float, floating[Any])` ## Local verification ``` $ ruff check . All checks passed! $ ruff format --check . 93 files already formatted $ mypy . Success: no issues found in 93 source files ``` Pull Request resolved: #258 Test Plan: - [x] CI `pre-commit` job passes - [x] CI `type-check` job passes - [x] Other CI jobs (`tests`, `gpu-tests`, `examples`) unchanged ## Notes Stacked on top of #256 (`fix/init-stale-rename-imports`) and #257 (`fix/examples-beta2-config`). Until those merge, the diff for this PR will appear to include their commits. Merge order: #256 → #257 → this. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed By: wz337 Differential Revision: D103082000 Pulled By: hjmshi fbshipit-source-id: d4bf91517bc7426019a3fde42420e4408b2bbc59
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
shampoo_types.pyrenamedAmortizedPreconditionerConfig→BaseShampooPreconditionerConfigandShampooPreconditionerConfig→ClassicShampooPreconditionerConfig(cb150a9), butdistributed_shampoo/__init__.pystill imported and re-exported the old names, soimport distributed_shampoofails at HEAD withImportError.__all__entries.Test plan
python -c "import distributed_shampoo"succeedsdistributed_shampoo.BaseShampooPreconditionerConfiganddistributed_shampoo.ClassicShampooPreconditionerConfigare accessible__all__resolves to an actual module attributeClassic→Base,RootInv→Classic,Eigendecomposed→Classic,EigenvalueCorrected→Base)