Move built-in PTQ quantization configs to YAML by shengliangxu · Pull Request #1423 · NVIDIA/Model-Optimizer

shengliangxu · 2026-05-09T17:32:05Z

What does this PR do?

Type of change: refactor

This PR is stacked on #1405 and only describes the changes added on top of that PR.

This PR moves the built-in PTQ quantization config definitions out of hard-coded Python dictionaries and into schema-backed YAML config files.

Adds reusable numeric config snippets under modelopt_recipes/configs/numerics/.
Adds YAML presets for the existing built-in model PTQ configs under modelopt_recipes/configs/ptq/presets/model/.
Adds YAML presets for KV-cache quantization configs under modelopt_recipes/configs/ptq/presets/kv/.
Adds reusable KV quantization units such as kv_fp8_affine, kv_nvfp4, kv_nvfp4_affine, and kv_nvfp4_rotate.
Updates modelopt.torch.quantization.config built-in config constants to load QuantizeConfig objects from YAML with load_config(..., schema_type=QuantizeConfig).
Removes the redundant _load_quantize_config wrapper.
Adds/updates recipe loader coverage for built-in schema-backed config snippets.

Usage

Existing Python imports continue to work:

import modelopt.torch.quantization as mtq

cfg = mtq.FP8_DEFAULT_CFG
model = mtq.quantize(model, cfg, forward_loop)

The built-in constants are still schema-backed QuantizeConfig objects with mapping-style access, but their definitions now come from YAML snippets and presets.

Reusable YAML snippets can also be composed through $import, for example:

# modelopt-schema: modelopt.torch.quantization.config.QuantizeConfig
imports:
  kv_fp8: configs/ptq/units/kv_fp8

quant_cfg:
  - $import: kv_fp8

Testing

Local checks run:

YAML parse check for changed config files
import-path check for changed YAML $import references
whitespace check for changed YAML files
git diff --check

Not run locally:

python -m pytest ... because the local environment used for this branch did not have pytest installed.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅ Existing built-in Python config constants keep the same public names and mapping-style access.
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: ✅ Adds/updates recipe loader coverage for schema-backed built-in snippets.
Did you update Changelog?: N/A

Additional Information

This PR depends on #1405 because it relies on schema-backed config loading and typed QuantizeConfig parsing introduced there.

This PR intentionally excludes the schema/mapping changes from #1405 and focuses on converting built-in PTQ config definitions to YAML-backed presets and reusable snippets.

Have load_config return Pydantic-normalized values when schema_type or modelopt-schema is present, including typed recipe metadata and quantization config entries. Update recipe loading, docs, and unit tests for typed config objects and normalized quant_cfg handling. Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Convert QuantizerCfgEntry into a ModeloptBaseConfig-backed Pydantic model with validation while preserving dict-style access for callers. Normalize schema-loaded quant_cfg snippets through model_dump, simplify quantizer cfg handling, and cover both dict and QuantizeConfig need_calibration inputs. Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Update normalize_quant_cfg_list to accept dict entries, typed entries, and legacy dict formats while returning QuantizerCfgEntry objects. Preserve already parsed entries, handle implicit enable values in consumers, and cover mixed typed/dict inputs in tests. Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Make ModeloptBaseConfig a MutableMapping and use Mapping/MutableMapping protocol checks for typed quantizer config entries and attributes. Convert predefined quantization recipes to QuantizeConfig objects while preserving dict-style callers and compatibility paths. Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Cover normalization after mutating raw dict quantizer entries and schema-backed ModeloptBaseConfig entries. Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

…e-cfg

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

copy-pr-bot · 2026-05-09T17:32:08Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-05-09T17:32:13Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c155386e-c106-41be-b3b6-6ed81f375194

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch shengliangx/all-yaml-configs

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-09T17:35:25Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1423/
Built to branch `gh-pages` at 2026-05-09 23:14 UTC. Preview will be ready when the GitHub Pages deployment is complete.

codecov · 2026-05-09T17:45:56Z

Codecov Report

❌ Patch coverage is 91.76829% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.88%. Comparing base (a098759) to head (df8c002).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/torch/quantization/config.py	92.04%	14 Missing ⚠️
modelopt/torch/quantization/algorithms.py	80.00%	4 Missing ⚠️
...torch/quantization/backends/fp8_per_tensor_gemm.py	82.35%	3 Missing ⚠️
modelopt/torch/opt/config.py	93.33%	2 Missing ⚠️
modelopt/torch/opt/config_loader.py	92.85%	2 Missing ⚠️
...delopt/onnx/llm_export_utils/quantization_utils.py	0.00%	1 Missing ⚠️
modelopt/torch/quantization/backends/nvfp4_gemm.py	92.30%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1423      +/-   ##
==========================================
- Coverage   76.91%   76.88%   -0.04%     
==========================================
  Files         478      478              
  Lines       51434    51619     +185     
==========================================
+ Hits        39563    39687     +124     
- Misses      11871    11932      +61

Flag	Coverage Δ
unit	`52.64% <82.31%> (+0.04%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

shengliangxu added 19 commits May 8, 2026 13:21

simplify quantize config loading

fdad74c

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

use ModeloptField

b7c9359

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Add mixed quant config normalization test

b23e3a9

Cover normalization after mutating raw dict quantizer entries and schema-backed ModeloptBaseConfig entries. Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Address quant config review feedback

5969cb3

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Update recipe loader schema expectations

0d31d46

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Tighten ModeloptBaseConfig mapping semantics

d33ee36

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

fix test errors

0917ab8

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Fix diffusers quant config explicit key handling

b5b45b9

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Merge remote-tracking branch 'origin/main' into shengliangx/schematiz…

3f533fa

…e-cfg

Merge remote-tracking branch 'origin/main' into shengliangx/schematiz…

164cccd

…e-cfg

fix review comments

d9eccf3

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

yaml for all hard coded PTQ configs

c69f86b

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

numerics yaml

a2e033a

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Remove quantize config loader wrapper

8481658

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Add KV quantization config units

b65f89d

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

shengliangxu added 3 commits May 9, 2026 10:56

Remove stale FP8 config comments

abec47d

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

update int4 int8

e90653b

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

update descriptions

df8c002

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move built-in PTQ quantization configs to YAML#1423

Move built-in PTQ quantization configs to YAML#1423
shengliangxu wants to merge 22 commits intomainfrom
shengliangx/all-yaml-configs

shengliangxu commented May 9, 2026

Uh oh!

copy-pr-bot Bot commented May 9, 2026

Uh oh!

coderabbitai Bot commented May 9, 2026 •

edited

Loading

Review skipped

Uh oh!

github-actions Bot commented May 9, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-05-09 23:14 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

codecov Bot commented May 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shengliangxu commented May 9, 2026

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot Bot commented May 9, 2026

Uh oh!

coderabbitai Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

github-actions Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2026-05-09 23:14 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

codecov Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 9, 2026 •

edited

Loading

github-actions Bot commented May 9, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-05-09 23:14 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

codecov Bot commented May 9, 2026 •

edited

Loading