feat(code_gen): add per-language subset metrics for livecodebench-x by rodboev · Pull Request #1719 · NVIDIA-NeMo/Gym

rodboev · 2026-06-25T01:11:48Z

compute_metrics() currently groups by difficulty only. livecodebench-x benchmark rows include a target_language field (de, es, fr, ja) as a top-level data row field (adjacent to verifier_metadata, not inside it), but the field was never declared on CompCodingVerifyRequest — Pydantic silently dropped it — and therefore never reached CompCodingVerifyResponse or compute_subset_metrics. As a result, per-language pass@k values never appear in rollouts_aggregate_metrics.json.

This PR fixes that by mirroring the existing difficulty pattern:

Adds target_language: Optional[str] = None to CompCodingVerifyRequest so Pydantic preserves the top-level field through routing
Adds target_language: Optional[str] = None to CompCodingVerifyResponse
Reads body.target_language in verify() and propagates it through all three return sites via **body.model_dump()
Calls compute_subset_metrics(tasks, "target_language", ...) alongside the existing difficulty call
Confirms get_key_metrics() already handles the new prefixes generically — no change needed there
Adds focused tests for the new metric keys and field propagation

Closes #1171

Checklist

Targeted tests pass: uv run pytest resources_servers/code_gen/tests/test_app.py -x -v
No changes to benchmark data files, config YAML, or nemo_gym/reward_profile.py
DCO sign-off on all commits

Signed-off-by: Rod Boev <rod.boev@gmail.com>

…correctly Signed-off-by: Rod Boev <rod.boev@gmail.com>

…_dump Signed-off-by: Rod Boev <rod.boev@gmail.com>

copy-pr-bot · 2026-06-25T01:11:51Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

rodboev added 3 commits June 24, 2026 20:49

feat(code_gen): add per-language subset metrics for livecodebench-x

a0a029e

Signed-off-by: Rod Boev <rod.boev@gmail.com>

fix(code_gen): declare target_language on verify request and extract …

bbeca8d

…correctly Signed-off-by: Rod Boev <rod.boev@gmail.com>

fix(code_gen): drop redundant target_language kwarg shadowed by model…

cd3ef8e

…_dump Signed-off-by: Rod Boev <rod.boev@gmail.com>

nemo-automation-bot Bot added the community-request Issue reported or requested by someone from the community label Jun 25, 2026

svcnvidia-nemo-ci added the waiting-on-maintainers Waiting on maintainers to respond label Jun 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(code_gen): add per-language subset metrics for livecodebench-x#1719

feat(code_gen): add per-language subset metrics for livecodebench-x#1719
rodboev wants to merge 3 commits into
NVIDIA-NeMo:mainfrom
rodboev:pr/code-gen-livecodebench-x-per-lang

rodboev commented Jun 25, 2026

Uh oh!

copy-pr-bot Bot commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

rodboev commented Jun 25, 2026

Checklist

Uh oh!

copy-pr-bot Bot commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants