skills: agent skills home (devops-bench-review + run-parallel-evals + run-eval)#132
Merged
Merged
Conversation
pradeepvrd
added a commit
that referenced
this pull request
Jun 25, 2026
The run-parallel-evals skill now lives in the standalone agent-skills PR (#132) so skills evolve independently of this feature branch. The docs it references (docs/parallel-evals.md, docs/bastion.md) and scripts/bastion/* remain here.
pradeepvrd
added a commit
that referenced
this pull request
Jun 25, 2026
The run-parallel-evals skill (and its .claude/skills discovery symlink) now live in the standalone agent-skills PR (#132).
e3dd11b to
da32df0
Compare
da32df0 to
f1c4194
Compare
jessie1111101
added a commit
that referenced
this pull request
Jun 25, 2026
Stacked on #132 (skills/agent-skills). Each matrix combo provisions its own cluster; this makes every task collision-free under concurrent runs: - 6 manifest-gen tasks -> deployer: noop (no cluster); legacy factory honors noop - optimize-scale: new prebuilt/optimize-scale GKE stack + pre-seeded workload; matrix pins TARGET_DEPLOYMENT_NAME/NAMESPACE so both arms agree - deploy-hello-app: run-unique Artifact Registry repo name - per-run tofu stack-dir copy (both arms) removes the shared .terraform.lock race (resolves the 'Shared OpenTofu working directory' known-issue) - import + parallel-fix the merged complex/GKE tasks (#64 migration, #87 opa, #93 multi-region, #86 postgres/unhealthy/gitops, #76 debug-crashloop): per-run GitOps repo paths, dropped shared-SA container.admin (BYO creds), region-prefixed cluster names (avoid node-SA substr collision), unique task_id - cp-recovery documented as the kind-only exception (docs/bastion.md)
e64394a to
7ac941f
Compare
pradeepvrd
added a commit
that referenced
this pull request
Jun 26, 2026
The run-parallel-evals skill now lives in the standalone agent-skills PR (#132) so skills evolve independently of this feature branch. The docs it references (docs/parallel-evals.md, docs/bastion.md) and scripts/bastion/* remain here.
pradeepvrd
added a commit
that referenced
this pull request
Jun 26, 2026
The run-parallel-evals skill (and its .claude/skills discovery symlink) now live in the standalone agent-skills PR (#132).
7588a3e to
ed54798
Compare
pradeepvrd
pushed a commit
that referenced
this pull request
Jun 26, 2026
Stacked on #132 (skills/agent-skills). Each matrix combo provisions its own cluster; this makes every task collision-free under concurrent runs: - 6 manifest-gen tasks -> deployer: noop (no cluster); legacy factory honors noop - optimize-scale: new prebuilt/optimize-scale GKE stack + pre-seeded workload; matrix pins TARGET_DEPLOYMENT_NAME/NAMESPACE so both arms agree - deploy-hello-app: run-unique Artifact Registry repo name - per-run tofu stack-dir copy (both arms) removes the shared .terraform.lock race (resolves the 'Shared OpenTofu working directory' known-issue) - import + parallel-fix the merged complex/GKE tasks (#64 migration, #87 opa, #93 multi-region, #86 postgres/unhealthy/gitops, #76 debug-crashloop): per-run GitOps repo paths, dropped shared-SA container.admin (BYO creds), region-prefixed cluster names (avoid node-SA substr collision), unique task_id - cp-recovery documented as the kind-only exception (docs/bastion.md)
pradeepvrd
added a commit
that referenced
this pull request
Jun 26, 2026
The run-parallel-evals skill now lives in the standalone agent-skills PR (#132) so skills evolve independently of this feature branch. The docs it references (docs/parallel-evals.md, docs/bastion.md) and scripts/bastion/* remain here.
pradeepvrd
added a commit
that referenced
this pull request
Jun 26, 2026
The run-parallel-evals skill (and its .claude/skills discovery symlink) now live in the standalone agent-skills PR (#132).
6b2d4b4 to
6fef3ff
Compare
a1b6078 to
b313cdf
Compare
pradeepvrd
added a commit
that referenced
this pull request
Jun 26, 2026
The run-parallel-evals skill now lives in the standalone agent-skills PR (#132) so skills evolve independently of this feature branch. The docs it references (docs/parallel-evals.md, docs/bastion.md) and scripts/bastion/* remain here.
pradeepvrd
added a commit
that referenced
this pull request
Jun 26, 2026
The run-parallel-evals skill (and its .claude/skills discovery symlink) now live in the standalone agent-skills PR (#132).
6fef3ff to
121e7fb
Compare
pradeepvrd
pushed a commit
that referenced
this pull request
Jun 26, 2026
Stacked on #132 (skills/agent-skills). Each matrix combo provisions its own cluster; this makes every task collision-free under concurrent runs: - 6 manifest-gen tasks -> deployer: noop (no cluster); legacy factory honors noop - optimize-scale: new prebuilt/optimize-scale GKE stack + pre-seeded workload; matrix pins TARGET_DEPLOYMENT_NAME/NAMESPACE so both arms agree - deploy-hello-app: run-unique Artifact Registry repo name - per-run tofu stack-dir copy (both arms) removes the shared .terraform.lock race (resolves the 'Shared OpenTofu working directory' known-issue) - import + parallel-fix the merged complex/GKE tasks (#64 migration, #87 opa, #93 multi-region, #86 postgres/unhealthy/gitops, #76 debug-crashloop): per-run GitOps repo paths, dropped shared-SA container.admin (BYO creds), region-prefixed cluster names (avoid node-SA substr collision), unique task_id - cp-recovery documented as the kind-only exception (docs/bastion.md)
b313cdf to
e9c740d
Compare
pradeepvrd
added a commit
that referenced
this pull request
Jun 26, 2026
The run-parallel-evals skill now lives in the standalone agent-skills PR (#132) so skills evolve independently of this feature branch. The docs it references (docs/parallel-evals.md, docs/bastion.md) and scripts/bastion/* remain here.
pradeepvrd
added a commit
that referenced
this pull request
Jun 26, 2026
The run-parallel-evals skill (and its .claude/skills discovery symlink) now live in the standalone agent-skills PR (#132).
121e7fb to
377a5aa
Compare
pradeepvrd
pushed a commit
that referenced
this pull request
Jun 26, 2026
Stacked on #132 (skills/agent-skills). Each matrix combo provisions its own cluster; this makes every task collision-free under concurrent runs: - 6 manifest-gen tasks -> deployer: noop (no cluster); legacy factory honors noop - optimize-scale: new prebuilt/optimize-scale GKE stack + pre-seeded workload; matrix pins TARGET_DEPLOYMENT_NAME/NAMESPACE so both arms agree - deploy-hello-app: run-unique Artifact Registry repo name - per-run tofu stack-dir copy (both arms) removes the shared .terraform.lock race (resolves the 'Shared OpenTofu working directory' known-issue) - import + parallel-fix the merged complex/GKE tasks (#64 migration, #87 opa, #93 multi-region, #86 postgres/unhealthy/gitops, #76 debug-crashloop): per-run GitOps repo paths, dropped shared-SA container.admin (BYO creds), region-prefixed cluster names (avoid node-SA substr collision), unique task_id - cp-recovery documented as the kind-only exception (docs/bastion.md)
e9c740d to
460cc45
Compare
pradeepvrd
added a commit
that referenced
this pull request
Jun 27, 2026
Add docs/parallel-evals.md: the end-to-end parallel-evaluation runbook (matrix CUJs, parallel-safety rules, resume-after-drop, Vertex setup), known issues from review findings, and the local-default / BENCH_REMOTE execution note. Docs-only; the run-parallel-evals skill lives in the skills PR (#132).
… run-eval) A dedicated home for agent skills so they evolve independently of feature PRs: - devops-bench-review (new): review-only review across correctness, parallel-safety across the eval matrix axes (Task × Model × AgentConfig), task/stack conventions, and docs conventions; runs unit tests / ruff only — never evals or infra. - run-parallel-evals: relocated here so all skills sit together; harness-agnostic with an Antigravity portability map and local/remote execution modes. - run-eval (new): drive a single Task × Model × AgentConfig run end to end (a 1×1×1 matrix); reuses run-parallel-evals' wrappers and recovery/reference files. Each skill is a source dir under .agents/skills/<name>/ plus a .claude/skills/ discovery symlink (force-added; .agents/.claude are git-excluded).
377a5aa to
fdb29d1
Compare
pradeepvrd
added a commit
that referenced
this pull request
Jun 27, 2026
Add docs/parallel-evals.md: the end-to-end parallel-evaluation runbook (matrix CUJs, parallel-safety rules, resume-after-drop, Vertex setup), known issues from review findings, and the local-default / BENCH_REMOTE execution note. Docs-only; the run-parallel-evals skill lives in the skills PR (#132).
pradeepvrd
added a commit
that referenced
this pull request
Jun 27, 2026
Add docs/parallel-evals.md: the end-to-end parallel-evaluation runbook (matrix CUJs, parallel-safety rules, resume-after-drop, Vertex setup), known issues from review findings, and the local-default / BENCH_REMOTE execution note. Docs-only; the run-parallel-evals skill lives in the skills PR (#132).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A dedicated home for agent skills / guidelines so they can evolve independently of feature PRs.
Stacked on #131 (
feat/bastion-matrix).Three skills:
devops-bench-review(new) — review-only review of a PR or the current workspace acrossfour lenses: correctness; parallel-safety across the eval matrix axes (Task × Model ×
AgentConfig) — the emphasis, with a shared-state checklist and per-axis reasoning; task & stack
conventions; and docs conventions. It analyzes statically and may run unit tests /
rufflint+format checks, but never runs benchmark evals or provisions infra.
run-parallel-evals— drives the full parallel matrix; harness-agnostic with an Antigravityportability map and local/remote execution modes. (Relocated here so all skills sit together.)
run-eval(new) — drive a single Task × Model × AgentConfig run end to end (a 1×1×1matrix); reuses
run-parallel-evals' wrappers and recovery/reference files.Each skill is a source dir under
.agents/skills/<name>/plus a.claude/skills/discovery symlink(both force-added, since
.agents/.claudeare in.git/info/exclude).Stacking / dependencies
feat/bastion-matrix) — providesdocs/bastion.mdandscripts/bastion/*thatrun-parallel-evals/run-evalreference.run-parallel-evalsalso referencesdocs/parallel-evals.md, which lands in docs(parallel-evals): parallel evaluation runbook + known issues #126(
feat/parallel-eval-runs); the skill is fully wired once docs(parallel-evals): parallel evaluation runbook + known issues #126 is in the merge path.