Add task-generation framework + first task (debug-crashloop)#76
Open
adrianchung wants to merge 1 commit into
Open
Add task-generation framework + first task (debug-crashloop)#76adrianchung wants to merge 1 commit into
adrianchung wants to merge 1 commit into
Conversation
A repeatable way to generate DevOps Bench tasks from an expert catalog and run them, plus the first validated task: - docs/task-generation/: methodology (task.yaml schema, ID allocation, expected_output rules, task classes, cluster access via the GKE MCP server), the expert task catalog as source of truth with a generation tracker, and a run + leaderboard guide. - AGENTS.md: vendor-neutral agent guidance pointing at the methodology. - tasks/generic/debug-crashloop/: first generated task with a CrashLoopBackOff fixture whose root cause is a missing DATABASE_URL env var. - pkg/agents/runner/api/mcp_client.py: forward the environment to the MCP server subprocess so it inherits KUBECONFIG and cloud credentials; without this the MCP server cannot resolve the target cluster's kubeconfig context.
3 tasks
pradeepvrd
reviewed
Jun 17, 2026
| # otherwise launches the server with a stripped default environment, which | ||
| # leaves it unable to resolve the target cluster's kubeconfig context. | ||
| server_params = StdioServerParameters( | ||
| command=self.server_path, env=os.environ.copy() |
Collaborator
There was a problem hiding this comment.
This can be dangerous especially since the agent will have access to the full environment context. I recommend being selective in what is being passed to the server.
jessie1111101
added a commit
that referenced
this pull request
Jun 25, 2026
Stacked on #132 (skills/agent-skills). Each matrix combo provisions its own cluster; this makes every task collision-free under concurrent runs: - 6 manifest-gen tasks -> deployer: noop (no cluster); legacy factory honors noop - optimize-scale: new prebuilt/optimize-scale GKE stack + pre-seeded workload; matrix pins TARGET_DEPLOYMENT_NAME/NAMESPACE so both arms agree - deploy-hello-app: run-unique Artifact Registry repo name - per-run tofu stack-dir copy (both arms) removes the shared .terraform.lock race (resolves the 'Shared OpenTofu working directory' known-issue) - import + parallel-fix the merged complex/GKE tasks (#64 migration, #87 opa, #93 multi-region, #86 postgres/unhealthy/gitops, #76 debug-crashloop): per-run GitOps repo paths, dropped shared-SA container.admin (BYO creds), region-prefixed cluster names (avoid node-SA substr collision), unique task_id - cp-recovery documented as the kind-only exception (docs/bastion.md)
pradeepvrd
pushed a commit
that referenced
this pull request
Jun 26, 2026
Stacked on #132 (skills/agent-skills). Each matrix combo provisions its own cluster; this makes every task collision-free under concurrent runs: - 6 manifest-gen tasks -> deployer: noop (no cluster); legacy factory honors noop - optimize-scale: new prebuilt/optimize-scale GKE stack + pre-seeded workload; matrix pins TARGET_DEPLOYMENT_NAME/NAMESPACE so both arms agree - deploy-hello-app: run-unique Artifact Registry repo name - per-run tofu stack-dir copy (both arms) removes the shared .terraform.lock race (resolves the 'Shared OpenTofu working directory' known-issue) - import + parallel-fix the merged complex/GKE tasks (#64 migration, #87 opa, #93 multi-region, #86 postgres/unhealthy/gitops, #76 debug-crashloop): per-run GitOps repo paths, dropped shared-SA container.admin (BYO creds), region-prefixed cluster names (avoid node-SA substr collision), unique task_id - cp-recovery documented as the kind-only exception (docs/bastion.md)
pradeepvrd
pushed a commit
that referenced
this pull request
Jun 26, 2026
Stacked on #132 (skills/agent-skills). Each matrix combo provisions its own cluster; this makes every task collision-free under concurrent runs: - 6 manifest-gen tasks -> deployer: noop (no cluster); legacy factory honors noop - optimize-scale: new prebuilt/optimize-scale GKE stack + pre-seeded workload; matrix pins TARGET_DEPLOYMENT_NAME/NAMESPACE so both arms agree - deploy-hello-app: run-unique Artifact Registry repo name - per-run tofu stack-dir copy (both arms) removes the shared .terraform.lock race (resolves the 'Shared OpenTofu working directory' known-issue) - import + parallel-fix the merged complex/GKE tasks (#64 migration, #87 opa, #93 multi-region, #86 postgres/unhealthy/gitops, #76 debug-crashloop): per-run GitOps repo paths, dropped shared-SA container.admin (BYO creds), region-prefixed cluster names (avoid node-SA substr collision), unique task_id - cp-recovery documented as the kind-only exception (docs/bastion.md)
pradeepvrd
pushed a commit
that referenced
this pull request
Jun 26, 2026
Stacked on #132 (skills/agent-skills). Each matrix combo provisions its own cluster; this makes every task collision-free under concurrent runs: - 6 manifest-gen tasks -> deployer: noop (no cluster); legacy factory honors noop - optimize-scale: new prebuilt/optimize-scale GKE stack + pre-seeded workload; matrix pins TARGET_DEPLOYMENT_NAME/NAMESPACE so both arms agree - deploy-hello-app: run-unique Artifact Registry repo name - per-run tofu stack-dir copy (both arms) removes the shared .terraform.lock race (resolves the 'Shared OpenTofu working directory' known-issue) - import + parallel-fix the merged complex/GKE tasks (#64 migration, #87 opa, #93 multi-region, #86 postgres/unhealthy/gitops, #76 debug-crashloop): per-run GitOps repo paths, dropped shared-SA container.admin (BYO creds), region-prefixed cluster names (avoid node-SA substr collision), unique task_id - cp-recovery documented as the kind-only exception (docs/bastion.md)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A repeatable way to generate DevOps Bench tasks from an expert catalog and run them against the framework, plus the first validated task.
Scope
docs/task-generation/— methodology (schema, ID allocation,expected_outputrules, task classes, cluster access via the GKE MCP server), the expert task catalog (source of truth) + generation tracker, and a run + leaderboard guideAGENTS.md— vendor-neutral agent guidance pointing at the methodologytasks/generic/debug-crashloop/— first generated task + a CrashLoopBackOff fixture (root cause: missingDATABASE_URL)pkg/agents/runner/api/mcp_client.py— forward the environment to the MCP server subprocess so it inheritsKUBECONFIG/cloud creds (otherwise the MCP server can't resolve the target cluster's kubeconfig context)