ci: add 2027 eval workflow for per-PR CLI evaluation#1088
Conversation
|
@claude review this PR please |
|
Claude finished @runeb's task in 1m 35s —— View job Review: 2027 eval workflowNice, focused change — single new workflow, polling logic is well-commented, and the failure-loud-on-missing-secret behavior is the right call. A few observations, ordered by importance. 1. Missing
|
|
Thanks for the thorough review! Pushed fc832e24 addressing 1, 2, 3, and 4:
Verified clean against actionlint locally. 🤖 Generated with Claude Code |
|
Thanks @caffeinum. Question; is a more accurate permission set permissions:
actions: read
issues: write
statuses: write? |
|
@runeb I think we don't need Edit (22 May): We also need |
7eb987e to
d963ef2
Compare
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Description
Integrated 2027's agent experience evaluations, allowing to test preview builds of Sanity CLI automatically on PRs labeled
trigger: preview. Uses existingpkg-pr-newaction to provide build preview URL to the 2027 runner (https://github.com/team2027/evals-action).Uses prompt https://2027.dev/evals/sanity.io/prompts/1d8004c6-d00c-432e-998b-e868a957807c, and triggers an eval, passing over the per-commit
@sanity/clibuild.evals-actionposts a sticky comment with eval status.The intent is to catch CLI regressions that hurt agent workflows (Claude Code, Codex, Cursor, etc.) before they ship — measuring time, cost, error count, and score for an end-to-end "set up a Sanity project with the CLI" task.
Dogfooded extensively in the team2027/sanity-cli fork — pipeline confirmed exercising the per-PR build (team2027#3 (comment) -- intentionally broken CLI scored differently than working CLI).
Uses 2027 API, see full reference here: https://2027.dev/evals/api/openapi
What to review
.github/workflows/2027-eval.yml— single new filePublish Preview Packages(pkg-pr-new) workflow's completion via SHA pollingteam2027/evals-action@v0.5.0(pinned to a tag, action source is public)Prerequisites for this to do anything:
EVALS_API_KEYset insanity-io/cli→ Settings → Secrets and variables → Actions. Generate at https://2027.dev/evals/sanity.io/settings.api-key input is required— no silent skip.1d8004c6-d00c-432e-998b-e868a957807c) is owned by thesanity.ioorg domain on 2027 and is the "Getting Started: Staging CLI" prompt. Anyone with asanity.io2027 API key can run it. To use a different prompt, edit thePROMPT_IDenv var.Security considerations
1d8004c6-d00c-432e-998b-e868a957807cis public on this repo. However, 2027 API is authenticated via org-level API token (stored inEVALS_API_KEYrepo secrets), so no information is visible to the public visitors, except for evals results in the public github commentpkg-pr-newbuild SHA (which is public)Testing
team2027/evals-action) has its own test suite with contract tests for the API wire format