Skip to content

evals: add 3 Python API evals for cuopt-numerical-optimization-api-python skill#1418

Open
rgsl888prabhu wants to merge 5 commits into
mainfrom
add-python-api-evals-numerical-optimization
Open

evals: add 3 Python API evals for cuopt-numerical-optimization-api-python skill#1418
rgsl888prabhu wants to merge 5 commits into
mainfrom
add-python-api-evals-numerical-optimization

Conversation

@rgsl888prabhu

@rgsl888prabhu rgsl888prabhu commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Adds 3 new evals to skills/cuopt-numerical-optimization-api-python/evals/evals.json (was 1, now 4), all grounded in skill-specific Python API content from SKILL.md.

…thon skill

Adds numopt-py-eval-002, 003, 004 covering skill-specific Python API gotchas:
- eval-002: status case sensitivity bug — 'OPTIMAL' vs 'Optimal' silent failure
- eval-003: INTEGER vs CONTINUOUS for countable entities (nurses/workers)
- eval-004: QP maximize workaround — negate coefficients and MINIMIZE

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@rgsl888prabhu rgsl888prabhu requested a review from a team as a code owner June 10, 2026 15:33
@rgsl888prabhu rgsl888prabhu requested a review from tmckayus June 10, 2026 15:33
@rgsl888prabhu rgsl888prabhu self-assigned this Jun 10, 2026
@rgsl888prabhu rgsl888prabhu added non-breaking Introduces a non-breaking change improvement Improves an existing functionality labels Jun 10, 2026
@rgsl888prabhu

Copy link
Copy Markdown
Collaborator Author

/nvskills-ci

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 324944bb-4265-4af9-afd8-5f5980cab292

📥 Commits

Reviewing files that changed from the base of the PR and between e616823 and e02d54f.

📒 Files selected for processing (3)
  • skills/cuopt-numerical-optimization-api-python/BENCHMARK.md
  • skills/cuopt-numerical-optimization-api-python/skill-card.md
  • skills/cuopt-numerical-optimization-api-python/skill.oms.sig
✅ Files skipped from review due to trivial changes (1)
  • skills/cuopt-numerical-optimization-api-python/skill.oms.sig

📝 Walkthrough

Walkthrough

This PR updates evals.json (one prompt edit, three new evaluation cases), refreshes BENCHMARK.md and skill-card.md to reflect a 4-task run and updated results, and regenerates the sigstore signature bundle (skill.oms.sig).

Changes

Evaluation test cases for cuOpt Python API

Layer / File(s) Summary
Existing evaluation case refinement
skills/cuopt-numerical-optimization-api-python/evals/evals.json
The first evaluation case's prompt and ground-truth strings are updated with escaped punctuation while preserving the intended content.
New API edge-case evaluation coverage
skills/cuopt-numerical-optimization-api-python/evals/evals.json
Three new evaluation cases are added: LP status-name case sensitivity (Optimal/PrimalFeasible vs OPTIMAL), integer vs continuous variable type selection using the vtype constant, and QP objective maximization via MINIMIZE with negated quadratic terms and adjusted final objective reporting.
Benchmark report updates
skills/cuopt-numerical-optimization-api-python/BENCHMARK.md
Header run parameters updated (date, environment, dataset size to 4 tasks, attempts), test dataset description and Results table refreshed, Tier 1 validation findings updated, and the separate Publication Recommendation section removed.
Skill card metadata and eval summary
skills/cuopt-numerical-optimization-api-python/skill-card.md
License normalized to “Apache 2.0”, references reordered/expanded, Output Type/Format wording revised (remove “Analysis”, adjust output wording), evaluation agent listing reformatted, and evaluation tasks/results updated to Num = 4.
Signature bundle regeneration
skills/cuopt-numerical-optimization-api-python/skill.oms.sig
DSSE payload subject/resource digests and the DSSE signature value regenerated in the sigstore bundle JSON.

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers:

  • Iroy30
  • tmckayus
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and specifically describes the main change: adding 3 new Python API evaluation cases to the cuopt skill's evals.json file.
Description check ✅ Passed The description clearly relates to the changeset, explaining that 3 new evals were added to the evals.json file, increasing from 1 to 4 total evaluations.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch add-python-api-evals-numerical-optimization

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@skills/cuopt-numerical-optimization-api-python/evals/evals.json`:
- Line 53: Update the "ground_truth" string to use a concave maximization
example with negative quadratic coefficients so it aligns with the NSD
requirement: replace "to maximize 0.04*x1*x1 + 0.02*x2*x2, minimize -0.04*x1*x1
- 0.02*x2*x2" with an equivalent that maximizes a concave quadratic (e.g., "to
maximize -0.04*x1*x1 - 0.02*x2*x2, minimize 0.04*x1*x1 + 0.02*x2*x2") and ensure
the surrounding explanation still references the Q matrix being NSD (concave)
and the negation producing a PSD Q for the solver; edit the ground_truth string
accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9c7ac972-8967-428f-81cf-22c4a56dae03

📥 Commits

Reviewing files that changed from the base of the PR and between c58e6fe and 2b4db09.

📒 Files selected for processing (1)
  • skills/cuopt-numerical-optimization-api-python/evals/evals.json

Comment thread skills/cuopt-numerical-optimization-api-python/evals/evals.json Outdated
@rgsl888prabhu

Copy link
Copy Markdown
Collaborator Author

/nvskills-ci

Signed-off-by: nvskills-svc-account <svc-nvskills-signing@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 10, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@rgsl888prabhu

Copy link
Copy Markdown
Collaborator Author

/ok to test f87c77a

- eval-004: use concave quadratic (-0.04*x1² - 0.02*x2²) as the
  maximize example — maximizing a convex quadratic is unbounded;
  clarify NSD requirement must hold for a finite maximum (CodeRabbit)
- fix end-of-file newline (pre-commit)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@rgsl888prabhu

Copy link
Copy Markdown
Collaborator Author

/nvskills-ci

Signed-off-by: nvskills-svc-account <svc-nvskills-signing@nvidia.com>
@rgsl888prabhu

Copy link
Copy Markdown
Collaborator Author

/ok to test e02d54f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants