Skip to content

Updates for supporting CVDP Agentic subset#1744

Draft
arti4nvj wants to merge 10 commits into
mainfrom
artij/cvdp_resources_server
Draft

Updates for supporting CVDP Agentic subset#1744
arti4nvj wants to merge 10 commits into
mainfrom
artij/cvdp_resources_server

Conversation

@arti4nvj

Copy link
Copy Markdown
Contributor

Refactor the CVDP resources server to verify RTL using the Apptainer Provider sandbox instead of the previous Docker harness, and add an agentic CVDP agent.

  • Split harness from verification logic: extracted the harness execution into a separate harness.py so app.py holds only the verifier logic.
  • Apptainer-based verification: the resources server now runs the CVDP test harness inside an Apptainer Provider sandbox.
  • New agentic agent (cvdp_agent/agentic_app.py): wraps the Claude Code agent, installs it into the Apptainer sandbox, and lets the model edit files and self-test with the in-container EDA tools.
  • Configs, tests, and README updated for both the non-agentic and agentic flows.

For n=1, for the agentic non-commerical subset, seeing 35.87% pass rate (compared to 40% from the original cvdp infra). For the non-agentic non-commerical subset, seeing 41.72% (in line with original cvdp infra).

arti4nvj and others added 10 commits June 24, 2026 00:04
Add an ApptainerProvider implementing the SandboxProvider protocol via the
local apptainer CLI: persistent instance lifecycle, exec with user/fakeroot
mapping, bind-mount file transfer, status, readiness probe, and teardown.
Register it under the name "apptainer" and add unit tests plus a README.

Signed-off-by: Arti Jain <artij@nvidia.com>
Signed-off-by: Arti Jain <artij@nvidia.com>
Signed-off-by: Arti Jain <artij@nvidia.com>
Parse Claude Code's authoritative num_turns from the stream-json result
event and include it in the returned metadata.

Signed-off-by: Arti Jain <artij@nvidia.com>
Add the CVDP code-generation environment built on the Apptainer sandbox
provider: resources server with harness execution, non-agentic and
agentic cvdp_agent harnesses, configs, tests, and example dataset.

Signed-off-by: Arti Jain <artij@nvidia.com>
Signed-off-by: Christian Munley <cmunley@nvidia.com>
…cvdp_resources_server

# Conflicts:
#	resources_servers/cvdp/README.md
@copy-pr-bot

copy-pr-bot Bot commented Jun 25, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@arti4nvj arti4nvj requested a review from hemildesai June 25, 2026 23:21

There are two ways to drive this resources server:

- **Non-agentic** (`cvdp_agent`, `responses_api_agents/cvdp_agent/app.py`, config `configs/cvdp_agent.yaml`): the model emits the RTL directly in its text response; the server parses it out and runs the harness.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a bit confusing to describe this path as non-agentic, considering cvdp_agent is itself an agent that we are using in the first scenario

There are two ways to drive this resources server:

- **Non-agentic** (`cvdp_agent`, `responses_api_agents/cvdp_agent/app.py`, config `configs/cvdp_agent.yaml`): the model emits the RTL directly in its text response; the server parses it out and runs the harness.
- **Agentic** (`cvdp_agent_agentic`, `responses_api_agents/cvdp_agent/agentic_app.py`, config `configs/cvdp_agent_agentic.yaml`): runs Claude Code **inside** the EDA sim container so it can edit files on disk and self-test with the in-container EDA tools, then reports the files it wrote back to the server as `rtl_files` for grading. See `[responses_api_agents/cvdp_agent/](../../responses_api_agents/cvdp_agent/)`.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend we think about harness as first-class composable unit, describe this as illustration using Claude Code but could swap in other harnesses as well.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the rationale for splitting the verifier into two files app.py and the naming behind harness.py?

@cmunley1

Copy link
Copy Markdown
Contributor

would suggest considering an approach like this to reuse all agent harnesses with 0 code rewriting https://github.com/NVIDIA-NeMo/Gym/blob/main/responses_api_agents/anyterminal_agent/app.py#L190

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants