Skip to content

Move built-in PTQ quantization configs to YAML#1423

Draft
shengliangxu wants to merge 22 commits intomainfrom
shengliangx/all-yaml-configs
Draft

Move built-in PTQ quantization configs to YAML#1423
shengliangxu wants to merge 22 commits intomainfrom
shengliangx/all-yaml-configs

Conversation

@shengliangxu
Copy link
Copy Markdown
Collaborator

What does this PR do?

Type of change: refactor

This PR is stacked on #1405 and only describes the changes added on top of that PR.

This PR moves the built-in PTQ quantization config definitions out of hard-coded Python dictionaries and into schema-backed YAML config files.

  • Adds reusable numeric config snippets under modelopt_recipes/configs/numerics/.
  • Adds YAML presets for the existing built-in model PTQ configs under modelopt_recipes/configs/ptq/presets/model/.
  • Adds YAML presets for KV-cache quantization configs under modelopt_recipes/configs/ptq/presets/kv/.
  • Adds reusable KV quantization units such as kv_fp8_affine, kv_nvfp4, kv_nvfp4_affine, and kv_nvfp4_rotate.
  • Updates modelopt.torch.quantization.config built-in config constants to load QuantizeConfig objects from YAML with load_config(..., schema_type=QuantizeConfig).
  • Removes the redundant _load_quantize_config wrapper.
  • Adds/updates recipe loader coverage for built-in schema-backed config snippets.

Usage

Existing Python imports continue to work:

import modelopt.torch.quantization as mtq

cfg = mtq.FP8_DEFAULT_CFG
model = mtq.quantize(model, cfg, forward_loop)

The built-in constants are still schema-backed QuantizeConfig objects with mapping-style access, but their definitions now come from YAML snippets and presets.

Reusable YAML snippets can also be composed through $import, for example:

# modelopt-schema: modelopt.torch.quantization.config.QuantizeConfig
imports:
  kv_fp8: configs/ptq/units/kv_fp8

quant_cfg:
  - $import: kv_fp8

Testing

Local checks run:

  • YAML parse check for changed config files
  • import-path check for changed YAML $import references
  • whitespace check for changed YAML files
  • git diff --check

Not run locally:

  • python -m pytest ... because the local environment used for this branch did not have pytest installed.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅ Existing built-in Python config constants keep the same public names and mapping-style access.
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
  • Did you write any new necessary tests?: ✅ Adds/updates recipe loader coverage for schema-backed built-in snippets.
  • Did you update Changelog?: N/A

Additional Information

This PR depends on #1405 because it relies on schema-backed config loading and typed QuantizeConfig parsing introduced there.

This PR intentionally excludes the schema/mapping changes from #1405 and focuses on converting built-in PTQ config definitions to YAML-backed presets and reusable snippets.

Have load_config return Pydantic-normalized values when schema_type or modelopt-schema is present, including typed recipe metadata and quantization config entries.

Update recipe loading, docs, and unit tests for typed config objects and normalized quant_cfg handling.

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Convert QuantizerCfgEntry into a ModeloptBaseConfig-backed Pydantic model with validation while preserving dict-style access for callers.

Normalize schema-loaded quant_cfg snippets through model_dump, simplify quantizer cfg handling, and cover both dict and QuantizeConfig need_calibration inputs.

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Update normalize_quant_cfg_list to accept dict entries, typed entries, and legacy dict formats while returning QuantizerCfgEntry objects.

Preserve already parsed entries, handle implicit enable values in consumers, and cover mixed typed/dict inputs in tests.

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Make ModeloptBaseConfig a MutableMapping and use Mapping/MutableMapping protocol checks for typed quantizer config entries and attributes.

Convert predefined quantization recipes to QuantizeConfig objects while preserving dict-style callers and compatibility paths.

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Cover normalization after mutating raw dict quantizer entries and schema-backed ModeloptBaseConfig entries.

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 9, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 9, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c155386e-c106-41be-b3b6-6ed81f375194

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch shengliangx/all-yaml-configs

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1423/

Built to branch gh-pages at 2026-05-09 23:14 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 9, 2026

Codecov Report

❌ Patch coverage is 91.76829% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.88%. Comparing base (a098759) to head (df8c002).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/torch/quantization/config.py 92.04% 14 Missing ⚠️
modelopt/torch/quantization/algorithms.py 80.00% 4 Missing ⚠️
...torch/quantization/backends/fp8_per_tensor_gemm.py 82.35% 3 Missing ⚠️
modelopt/torch/opt/config.py 93.33% 2 Missing ⚠️
modelopt/torch/opt/config_loader.py 92.85% 2 Missing ⚠️
...delopt/onnx/llm_export_utils/quantization_utils.py 0.00% 1 Missing ⚠️
modelopt/torch/quantization/backends/nvfp4_gemm.py 92.30% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1423      +/-   ##
==========================================
- Coverage   76.91%   76.88%   -0.04%     
==========================================
  Files         478      478              
  Lines       51434    51619     +185     
==========================================
+ Hits        39563    39687     +124     
- Misses      11871    11932      +61     
Flag Coverage Δ
unit 52.64% <82.31%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant