fix(pipeline): harden eval runtime plumbing by pzelasko · Pull Request #1484 · NVIDIA-NeMo/Skills

pzelasko · 2026-06-15T15:22:46Z

Migration note

Supersedes #1474. Recreated from the NVIDIA-NeMo/Skills branch codex/pr1443-runtime-fixes so repository CI can run.

Summary

Split out from #1443.

This PR contains general runtime and pipeline hardening that is useful independently of the AppTek benchmark.

What Changed

Hardened async generation merge behavior:
- preserves async output order
- rejects negative or duplicate positions
- reports missing indices instead of writing null rows
- handles empty async output cleanly
Improved multimodal message content normalization:
- accepts typed text dictionaries
- normalizes None text to empty content
- rejects malformed dicts and unsupported content types with clear errors
Improved self-hosted server readiness checks:
- normalizes server addresses
- probes /v1/models and /health
- avoids relying on unsupported raw-address checks
Reduced random port collisions for concurrent SLURM submissions:
- supports deferred SLURM_JOB_ID-derived ports
- stores timestamped random-port reservations
- prunes stale reservations with a TTL
Plumbed integer or string-valued ports through server and sandbox scripts.
Refined judge/summarize sbatch override handling and exclusivity defaults.

Relationship To #1443

This PR is a prerequisite/adjacent cleanup split out of the original AppTek PR. It does not add the AppTek benchmark itself.

Validation

python -m py_compile on changed Python files.
Isolated smoke test for timestamped port reservation pruning.
CI lint/DCO/copyright checks are passing.

Summary by CodeRabbit

Release Notes

Bug Fixes
- Hardened async merged output ordering with validation for negatives/duplicates and clearer errors for missing indices.
- Prevented unintended SLURM variable expansion for sandbox/listen ports.
- Improved server readiness checks by polling the correct health endpoints.
New Features
- Added deterministic SLURM-based port allocation and safer reserved random port selection.
- Enabled string port values for server/sandbox and related override injection.
Improvements
- Improved multimodal text normalization and evaluation summarization scheduler defaults.

Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>

coderabbitai · 2026-06-15T15:29:24Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5e942893-6f2f-4793-b62f-9a3a0e472c4a

📥 Commits

Reviewing files that changed from the base of the PR and between b489426 and 44bb65a.

📒 Files selected for processing (1)

tests/test_eval.py

📝 Walkthrough

Walkthrough

The PR extends SLURM port allocation to support deterministic job-ID-based ports and concurrent-safe random allocation with file-lock TTL reservations, propagates the resulting int|str port type through pipeline scripts, guards SLURM_JOB_ID placeholder expansion in cluster env utilities, adds exclusive defaulting logic and summarize task time limits, updates server readiness polling endpoints, hardens async generation order validation, and normalizes vLLM multimodal dict content handling.

Changes

Inference Robustness

Layer / File(s)	Summary
Async order restoration validation `nemo_skills/inference/generate.py`	`restore_async_order()` sizes output from max observed position, validates non-negative unique positions, returns empty list for empty input, and raises on missing indices instead of writing null rows.
vLLM multimodal dict content normalization `nemo_skills/inference/model/vllm_multimodal.py`	`content_text_to_list` accepts `{type:"text", text:...}` dicts, mapping them to normalized list-form or raising `TypeError`; `_generate_with_chunking` explicitly branches on `str`/`dict`/`list` content types.

SLURM Port Allocation and Pipeline Defaults

Layer / File(s)	Summary
Port allocation core: SLURM and concurrent-safe random `nemo_skills/pipeline/utils/server.py`	`get_free_port` gains new imports and module-level state; returns a shell arithmetic expression under `NEMO_SKILLS_USE_SLURM_JOB_ID_PORTS=1`; uses a per-user file lock with TTL reservation for random allocation; `get_server_command` broadens `server_port` to `int\|str`.
String port type propagation through scripts `nemo_skills/pipeline/utils/scripts/server.py`, `nemo_skills/pipeline/utils/scripts/eval.py`	`ServerScript.port` and `SandboxScript.port` re-typed to `Optional[int\|str]`; `_inject_single_server_overrides` `port` parameter widened to `int\|str\|None`.
SLURM env placeholder skip, exclusive default, and summarize time limits `nemo_skills/pipeline/utils/cluster.py`, `nemo_skills/pipeline/utils/exp.py`, `nemo_skills/pipeline/eval.py`	`get_env_variables` skips resolving `$SLURM_JOB_ID` inside `NEMO_SKILLS_SANDBOX_PORT`; `get_executor` applies `exclusive=True` only as a default and removes the key when explicitly `False`; `eval` sets short `time`/`time_min` defaults for summarize tasks when no explicit `sbatch_kwargs` were provided.
Server readiness polling update `nemo_skills/utils.py`	`get_server_wait_cmd` normalizes the address to `http(s)`, derives `/models` and `/health` endpoints (handling `/v1` suffix), and polls via `curl -sS` instead of the prior `curl -X PUT` loop.
Test updates for summarize sbatch defaults `tests/test_eval.py`	Introduces `DEFAULT_SUMMARIZE_SBATCH_KWARGS` constant and updates `test_eval_summarize_sbatch_kwargs_and_account` parametrized expectations to include the new `exclusive`/`mem`/`time`/`time_min` default fields.

Sequence Diagram(s)

sequenceDiagram
    participant Caller as Pipeline Caller
    participant GPF as get_free_port
    participant FSLock as Per-user File Lock
    participant ResvFile as Reservation File
    participant GSC as get_server_command

    Caller->>GPF: get_free_port(strategy="random")
    alt NEMO_SKILLS_USE_SLURM_JOB_ID_PORTS=1
        GPF-->>Caller: "$((SLURM_JOB_ID + offset))" (str)
    else random allocation
        GPF->>FSLock: acquire lock
        GPF->>ResvFile: read TTL reservations
        loop until port found or max attempts
            GPF->>ResvFile: check port, evict stale, write reservation
        end
        GPF->>FSLock: release lock
        GPF-->>Caller: port (int)
    end
    Caller->>GSC: get_server_command(..., server_port=port)
    GSC-->>Caller: command string with port embedded

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 58.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'fix(pipeline): harden eval runtime plumbing' accurately summarizes the main change—hardening the evaluation runtime pipeline with various runtime and pipeline improvements across multiple modules.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/pr1443-runtime-fixes

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nemo_skills/pipeline/utils/cluster.py`:
- Around line 262-263: The conditional check at line 262 only bypasses expansion
for "NEMO_SKILLS_SANDBOX_PORT" when the value contains "SLURM_JOB_ID", but
"LISTEN_PORT" and "NGINX_PORT" also contain SLURM expressions from the same
injection path and need the same bypass to avoid triggering the resolver and
raising ValueError. Broaden the condition to check if the key is any of these
three environment variables (NEMO_SKILLS_SANDBOX_PORT, LISTEN_PORT, or
NGINX_PORT) and the value contains "SLURM_JOB_ID", then skip expansion for all
of them.

In `@nemo_skills/utils.py`:
- Around line 635-636: The readiness loop curl commands are using only -sS
flags, which exit successfully even on HTTP 4xx/5xx error responses, causing the
loop to mark the server as ready prematurely. Add the --fail flag to both curl
commands for models_url and health_url to ensure curl exits with a non-zero
status on HTTP error responses, and optionally add a short timeout using
--max-time to prevent hanging on unresponsive endpoints.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d7bc07c8-c4ef-4264-9a75-8f47025a4b53

📥 Commits

Reviewing files that changed from the base of the PR and between da85a88 and 5775e9d.

📒 Files selected for processing (9)

nemo_skills/inference/generate.py
nemo_skills/inference/model/vllm_multimodal.py
nemo_skills/pipeline/eval.py
nemo_skills/pipeline/utils/cluster.py
nemo_skills/pipeline/utils/exp.py
nemo_skills/pipeline/utils/scripts/eval.py
nemo_skills/pipeline/utils/scripts/server.py
nemo_skills/pipeline/utils/server.py
nemo_skills/utils.py

Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>

fix(pipeline): harden eval runtime plumbing

5775e9d

Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>

This was referenced Jun 15, 2026

feat(speech): add AppTek Call-Center Dialogues ASR benchmark #1486

Open

fix(pipeline): harden eval runtime plumbing #1474

Closed

coderabbitai Bot reviewed Jun 15, 2026

View reviewed changes

Comment thread nemo_skills/pipeline/utils/cluster.py Outdated

Comment thread nemo_skills/utils.py Outdated

pzelasko added 2 commits June 15, 2026 08:44

fix: address runtime review comments

b489426

Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>

test: update summarize sbatch defaults

44bb65a

Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(pipeline): harden eval runtime plumbing#1484

fix(pipeline): harden eval runtime plumbing#1484
pzelasko wants to merge 3 commits into
mainfrom
codex/pr1443-runtime-fixes

pzelasko commented Jun 15, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 15, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

pzelasko commented Jun 15, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Migration note

Summary

What Changed

Relationship To #1443

Validation

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pzelasko commented Jun 15, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 15, 2026 •

edited

Loading