Skip to content

feat: add GKE DevOps benchmark tasks for M1 Milestone (CUJ 1, 2 and 3)#86

Draft
jayantid wants to merge 7 commits into
gke-labs:mainfrom
jayantid:feature/add-m1-tasks
Draft

feat: add GKE DevOps benchmark tasks for M1 Milestone (CUJ 1, 2 and 3)#86
jayantid wants to merge 7 commits into
gke-labs:mainfrom
jayantid:feature/add-m1-tasks

Conversation

@jayantid

@jayantid jayantid commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Description

This PR adds three new benchmark tasks to evaluate GKE-focused operations agents against the M1 Milestone requirements. It also introduces two new prebuilt infrastructure stacks to support troubleshooting and GitOps scenario setups.


New Tasks Added

1. Deploy Postgres-Backed Web App (CUJ 1: Deploy via NL)

  • Path: tasks/gcp/deploy-postgres-web-app/task.yaml
  • Goal: Evaluates if the agent can deploy a secure, compliant two-tier application (Python + Postgres) in a dynamic namespace.
  • Key Checks:
    • Generates and applies manifests for both web-app and database using standard public images.
    • Configures environment-based database connection.
    • Enforces GKE security hardening guidelines (e.g., runAsNonRoot: true, allowPrivilegeEscalation: false).
    • Sets appropriate resource requests/limits.

2. Troubleshoot Unhealthy Pod (CUJ 2: Pod RCA)

  • Path: tasks/gcp/troubleshoot-unhealthy-pod/task.yaml
  • Goal: Evaluates the agent's ability to perform Root Cause Analysis (RCA) on a failing deployment with a deterministic mitigation path.
  • Key Checks:
    • Identifies the failing state (CreateContainerConfigError) of the frontend service in the target namespace.
    • Inspects pod events to identify the missing ConfigMap (frontend-config) and missing key (api-url).
    • Suggests the exact mitigation: Create the missing ConfigMap frontend-config with the api-url key.

3. GitOps Auto-Revert (CUJ 3: Manage Cluster Config)

  • Path: tasks/gcp/gitops-auto-revert/task.yaml
  • Goal: Evaluates the agent's ability to detect a deployment failure caused by a bad config change, inspect git history, and propose a fix in the Git repository.
  • Key Checks:
    • Identifies that the deployment hello-app is failing due to ImagePullBackOff (invalid image tag).
    • Inspects the git log at /app/results/gitops-repo to find the last working image tag (1.0).
    • Creates a new branch (e.g., fix-hello-app) and modifies the manifest to revert the image tag back to 1.0.
    • Commits the fix to the repository.

Infrastructure Changes (tf/)

  • Added tf/prebuilt/unhealthy-pod stack:
    • Provisions GKE cluster + deploys a frontend pod referencing a missing ConfigMap to trigger CreateContainerConfigError.
  • Added tf/prebuilt/gitops-revert stack:
    • Provisions GKE cluster + deploys a hello-app pod with an invalid image tag.
    • Uses a null_resource to initialize a local Git repository at /app/results/gitops-repo with the mock commit history (v1.0 working -> v2.0-broken).

Both stacks support dynamic namespaces via OpenTofu variables.


Testing Locally

You can run these tasks locally using the devops-bench runner by pointing to the task configs:

# Run Task 1 (CUJ 1)
export BENCH_TASK_FILE="tasks/gcp/deploy-postgres-web-app/task.yaml"

# Run Task 2 (CUJ 2)
export BENCH_TASK_FILE="tasks/gcp/troubleshoot-unhealthy-pod/task.yaml"

# Run Task 3 (CUJ 3)
export BENCH_TASK_FILE="tasks/gcp/gitops-auto-revert/task.yaml"

# Run the evaluator
./scripts/entrypoint.sh

@jayantid jayantid marked this pull request as draft June 17, 2026 19:26
@jayantid jayantid changed the title feat: add GKE DevOps benchmark tasks for M1 Milestone (CUJ 1 & 2) feat: add GKE DevOps benchmark tasks for M1 Milestone (CUJ 1, 2 and 3) Jun 17, 2026
jessie1111101 added a commit that referenced this pull request Jun 25, 2026
Stacked on #132 (skills/agent-skills). Each matrix combo provisions its own
cluster; this makes every task collision-free under concurrent runs:

- 6 manifest-gen tasks -> deployer: noop (no cluster); legacy factory honors noop
- optimize-scale: new prebuilt/optimize-scale GKE stack + pre-seeded workload;
  matrix pins TARGET_DEPLOYMENT_NAME/NAMESPACE so both arms agree
- deploy-hello-app: run-unique Artifact Registry repo name
- per-run tofu stack-dir copy (both arms) removes the shared .terraform.lock race
  (resolves the 'Shared OpenTofu working directory' known-issue)
- import + parallel-fix the merged complex/GKE tasks (#64 migration, #87 opa,
  #93 multi-region, #86 postgres/unhealthy/gitops, #76 debug-crashloop):
  per-run GitOps repo paths, dropped shared-SA container.admin (BYO creds),
  region-prefixed cluster names (avoid node-SA substr collision), unique task_id
- cp-recovery documented as the kind-only exception (docs/bastion.md)
pradeepvrd pushed a commit that referenced this pull request Jun 26, 2026
Stacked on #132 (skills/agent-skills). Each matrix combo provisions its own
cluster; this makes every task collision-free under concurrent runs:

- 6 manifest-gen tasks -> deployer: noop (no cluster); legacy factory honors noop
- optimize-scale: new prebuilt/optimize-scale GKE stack + pre-seeded workload;
  matrix pins TARGET_DEPLOYMENT_NAME/NAMESPACE so both arms agree
- deploy-hello-app: run-unique Artifact Registry repo name
- per-run tofu stack-dir copy (both arms) removes the shared .terraform.lock race
  (resolves the 'Shared OpenTofu working directory' known-issue)
- import + parallel-fix the merged complex/GKE tasks (#64 migration, #87 opa,
  #93 multi-region, #86 postgres/unhealthy/gitops, #76 debug-crashloop):
  per-run GitOps repo paths, dropped shared-SA container.admin (BYO creds),
  region-prefixed cluster names (avoid node-SA substr collision), unique task_id
- cp-recovery documented as the kind-only exception (docs/bastion.md)
pradeepvrd pushed a commit that referenced this pull request Jun 26, 2026
Stacked on #132 (skills/agent-skills). Each matrix combo provisions its own
cluster; this makes every task collision-free under concurrent runs:

- 6 manifest-gen tasks -> deployer: noop (no cluster); legacy factory honors noop
- optimize-scale: new prebuilt/optimize-scale GKE stack + pre-seeded workload;
  matrix pins TARGET_DEPLOYMENT_NAME/NAMESPACE so both arms agree
- deploy-hello-app: run-unique Artifact Registry repo name
- per-run tofu stack-dir copy (both arms) removes the shared .terraform.lock race
  (resolves the 'Shared OpenTofu working directory' known-issue)
- import + parallel-fix the merged complex/GKE tasks (#64 migration, #87 opa,
  #93 multi-region, #86 postgres/unhealthy/gitops, #76 debug-crashloop):
  per-run GitOps repo paths, dropped shared-SA container.admin (BYO creds),
  region-prefixed cluster names (avoid node-SA substr collision), unique task_id
- cp-recovery documented as the kind-only exception (docs/bastion.md)
pradeepvrd pushed a commit that referenced this pull request Jun 26, 2026
Stacked on #132 (skills/agent-skills). Each matrix combo provisions its own
cluster; this makes every task collision-free under concurrent runs:

- 6 manifest-gen tasks -> deployer: noop (no cluster); legacy factory honors noop
- optimize-scale: new prebuilt/optimize-scale GKE stack + pre-seeded workload;
  matrix pins TARGET_DEPLOYMENT_NAME/NAMESPACE so both arms agree
- deploy-hello-app: run-unique Artifact Registry repo name
- per-run tofu stack-dir copy (both arms) removes the shared .terraform.lock race
  (resolves the 'Shared OpenTofu working directory' known-issue)
- import + parallel-fix the merged complex/GKE tasks (#64 migration, #87 opa,
  #93 multi-region, #86 postgres/unhealthy/gitops, #76 debug-crashloop):
  per-run GitOps repo paths, dropped shared-SA container.admin (BYO creds),
  region-prefixed cluster names (avoid node-SA substr collision), unique task_id
- cp-recovery documented as the kind-only exception (docs/bastion.md)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant