Skip to content

fix(pipeline): harden eval runtime plumbing#1484

Open
pzelasko wants to merge 3 commits into
mainfrom
codex/pr1443-runtime-fixes
Open

fix(pipeline): harden eval runtime plumbing#1484
pzelasko wants to merge 3 commits into
mainfrom
codex/pr1443-runtime-fixes

Conversation

@pzelasko

@pzelasko pzelasko commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Migration note

Supersedes #1474. Recreated from the NVIDIA-NeMo/Skills branch codex/pr1443-runtime-fixes so repository CI can run.

Summary

Split out from #1443.

This PR contains general runtime and pipeline hardening that is useful independently of the AppTek benchmark.

What Changed

  • Hardened async generation merge behavior:
    • preserves async output order
    • rejects negative or duplicate positions
    • reports missing indices instead of writing null rows
    • handles empty async output cleanly
  • Improved multimodal message content normalization:
    • accepts typed text dictionaries
    • normalizes None text to empty content
    • rejects malformed dicts and unsupported content types with clear errors
  • Improved self-hosted server readiness checks:
    • normalizes server addresses
    • probes /v1/models and /health
    • avoids relying on unsupported raw-address checks
  • Reduced random port collisions for concurrent SLURM submissions:
    • supports deferred SLURM_JOB_ID-derived ports
    • stores timestamped random-port reservations
    • prunes stale reservations with a TTL
  • Plumbed integer or string-valued ports through server and sandbox scripts.
  • Refined judge/summarize sbatch override handling and exclusivity defaults.

Relationship To #1443

This PR is a prerequisite/adjacent cleanup split out of the original AppTek PR. It does not add the AppTek benchmark itself.

Validation

  • python -m py_compile on changed Python files.
  • Isolated smoke test for timestamped port reservation pruning.
  • CI lint/DCO/copyright checks are passing.

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Hardened async merged output ordering with validation for negatives/duplicates and clearer errors for missing indices.
    • Prevented unintended SLURM variable expansion for sandbox/listen ports.
    • Improved server readiness checks by polling the correct health endpoints.
  • New Features

    • Added deterministic SLURM-based port allocation and safer reserved random port selection.
    • Enabled string port values for server/sandbox and related override injection.
  • Improvements

    • Improved multimodal text normalization and evaluation summarization scheduler defaults.

Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5e942893-6f2f-4793-b62f-9a3a0e472c4a

📥 Commits

Reviewing files that changed from the base of the PR and between b489426 and 44bb65a.

📒 Files selected for processing (1)
  • tests/test_eval.py

📝 Walkthrough

Walkthrough

The PR extends SLURM port allocation to support deterministic job-ID-based ports and concurrent-safe random allocation with file-lock TTL reservations, propagates the resulting int|str port type through pipeline scripts, guards SLURM_JOB_ID placeholder expansion in cluster env utilities, adds exclusive defaulting logic and summarize task time limits, updates server readiness polling endpoints, hardens async generation order validation, and normalizes vLLM multimodal dict content handling.

Changes

Inference Robustness

Layer / File(s) Summary
Async order restoration validation
nemo_skills/inference/generate.py
restore_async_order() sizes output from max observed position, validates non-negative unique positions, returns empty list for empty input, and raises on missing indices instead of writing null rows.
vLLM multimodal dict content normalization
nemo_skills/inference/model/vllm_multimodal.py
content_text_to_list accepts {type:"text", text:...} dicts, mapping them to normalized list-form or raising TypeError; _generate_with_chunking explicitly branches on str/dict/list content types.

SLURM Port Allocation and Pipeline Defaults

Layer / File(s) Summary
Port allocation core: SLURM and concurrent-safe random
nemo_skills/pipeline/utils/server.py
get_free_port gains new imports and module-level state; returns a shell arithmetic expression under NEMO_SKILLS_USE_SLURM_JOB_ID_PORTS=1; uses a per-user file lock with TTL reservation for random allocation; get_server_command broadens server_port to int|str.
String port type propagation through scripts
nemo_skills/pipeline/utils/scripts/server.py, nemo_skills/pipeline/utils/scripts/eval.py
ServerScript.port and SandboxScript.port re-typed to Optional[int|str]; _inject_single_server_overrides port parameter widened to int|str|None.
SLURM env placeholder skip, exclusive default, and summarize time limits
nemo_skills/pipeline/utils/cluster.py, nemo_skills/pipeline/utils/exp.py, nemo_skills/pipeline/eval.py
get_env_variables skips resolving $SLURM_JOB_ID inside NEMO_SKILLS_SANDBOX_PORT; get_executor applies exclusive=True only as a default and removes the key when explicitly False; eval sets short time/time_min defaults for summarize tasks when no explicit sbatch_kwargs were provided.
Server readiness polling update
nemo_skills/utils.py
get_server_wait_cmd normalizes the address to http(s), derives /models and /health endpoints (handling /v1 suffix), and polls via curl -sS instead of the prior curl -X PUT loop.
Test updates for summarize sbatch defaults
tests/test_eval.py
Introduces DEFAULT_SUMMARIZE_SBATCH_KWARGS constant and updates test_eval_summarize_sbatch_kwargs_and_account parametrized expectations to include the new exclusive/mem/time/time_min default fields.

Sequence Diagram(s)

sequenceDiagram
    participant Caller as Pipeline Caller
    participant GPF as get_free_port
    participant FSLock as Per-user File Lock
    participant ResvFile as Reservation File
    participant GSC as get_server_command

    Caller->>GPF: get_free_port(strategy="random")
    alt NEMO_SKILLS_USE_SLURM_JOB_ID_PORTS=1
        GPF-->>Caller: "$((SLURM_JOB_ID + offset))" (str)
    else random allocation
        GPF->>FSLock: acquire lock
        GPF->>ResvFile: read TTL reservations
        loop until port found or max attempts
            GPF->>ResvFile: check port, evict stale, write reservation
        end
        GPF->>FSLock: release lock
        GPF-->>Caller: port (int)
    end
    Caller->>GSC: get_server_command(..., server_port=port)
    GSC-->>Caller: command string with port embedded
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 58.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'fix(pipeline): harden eval runtime plumbing' accurately summarizes the main change—hardening the evaluation runtime pipeline with various runtime and pipeline improvements across multiple modules.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/pr1443-runtime-fixes

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nemo_skills/pipeline/utils/cluster.py`:
- Around line 262-263: The conditional check at line 262 only bypasses expansion
for "NEMO_SKILLS_SANDBOX_PORT" when the value contains "SLURM_JOB_ID", but
"LISTEN_PORT" and "NGINX_PORT" also contain SLURM expressions from the same
injection path and need the same bypass to avoid triggering the resolver and
raising ValueError. Broaden the condition to check if the key is any of these
three environment variables (NEMO_SKILLS_SANDBOX_PORT, LISTEN_PORT, or
NGINX_PORT) and the value contains "SLURM_JOB_ID", then skip expansion for all
of them.

In `@nemo_skills/utils.py`:
- Around line 635-636: The readiness loop curl commands are using only -sS
flags, which exit successfully even on HTTP 4xx/5xx error responses, causing the
loop to mark the server as ready prematurely. Add the --fail flag to both curl
commands for models_url and health_url to ensure curl exits with a non-zero
status on HTTP error responses, and optionally add a short timeout using
--max-time to prevent hanging on unresponsive endpoints.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d7bc07c8-c4ef-4264-9a75-8f47025a4b53

📥 Commits

Reviewing files that changed from the base of the PR and between da85a88 and 5775e9d.

📒 Files selected for processing (9)
  • nemo_skills/inference/generate.py
  • nemo_skills/inference/model/vllm_multimodal.py
  • nemo_skills/pipeline/eval.py
  • nemo_skills/pipeline/utils/cluster.py
  • nemo_skills/pipeline/utils/exp.py
  • nemo_skills/pipeline/utils/scripts/eval.py
  • nemo_skills/pipeline/utils/scripts/server.py
  • nemo_skills/pipeline/utils/server.py
  • nemo_skills/utils.py

Comment thread nemo_skills/pipeline/utils/cluster.py Outdated
Comment thread nemo_skills/utils.py Outdated
pzelasko added 2 commits June 15, 2026 08:44
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant