An AI-driven pipeline that designs, deploys, and validates production-ready GKE reference architectures β dual-path (Terraform + Helm and Config Connector) β with every merge.
- Design β Use an AI agent (Gemini CLI + Claude) to author complete, enterprise-grade IaC templates from Google Cloud reference architectures, covering both Terraform/Helm and Config Connector deployment paths.
- Deploy & Test β Run every template through a full apply β verify β destroy cycle in a real GCP sandbox project before any PR is merged, and again after merge to confirm the published artifact works end-to-end.
- Consolidate β Act as a living, continuously-validated library of GKE patterns drawn from Google Cloud's public reference repositories, so teams can adopt them with confidence.
The forge is powered by the operator stack from gke-labs/gemini-for-kubernetes-development, running on a GKE Standard control-plane cluster.
flowchart LR
DEV["π€ Developer\nopens issue"]
subgraph OPS ["gke-labs Operator Stack"]
OV["π― Overseer + Repo-Agent\ncreates branch & PR"]
AG["π€ Agent Sandbox\nauthors Terraform + KCC templates"]
OV --> AG
end
subgraph GH ["GitHub β gcp-template-forge"]
PR["π Pull Request\nlint Β· deploy-test-tf β₯ deploy-test-kcc"]
MAIN["β
main\nvalidated template library"]
PR -->|approved + merged| MAIN
end
subgraph GCP ["GCP Sandbox"]
TF["ποΈ TF + Helm\ncluster"]
KCC["βΈοΈ Config Connector\ncluster"]
end
DEV --> OV
AG -->|commits templates| PR
PR -->|deploy-test-tf| TF
PR -->|deploy-test-kcc| KCC
MAIN -->|validate-tf-helm β₯ validate-kcc\nthen publish-validated| TF
MAIN -->|validate-tf-helm β₯ validate-kcc\nthen publish-validated| KCC
| Component | Role | Repo |
|---|---|---|
| Overseer | Kubernetes operator that watches GitHub for new issues, coordinates the agent lifecycle, and manages PR state | gke-labs/gemini-for-kubernetes-development |
| Repo-Agent | Creates GitHub issues, branches, and PRs; posts status comments; triggers the agent sandbox | same |
| AgentSandboxes | Kubernetes Jobs that spin up an isolated Gemini CLI session per template; the agent authors all IaC files and commits them | same |
| CI Service Account | GCP service account used by GitHub Actions CI to authenticate and run Terraform/Helm/KCC against the sandbox project | agent-infra/ |
.github/
workflows/
sandbox-validation-*.yml β lint Β· deploy-test-tf β₯ deploy-test-kcc (PR) Β· validate-tf-helm β₯ validate-kcc Β· publish-validated (push)
ISSUE_TEMPLATE/ β template request form
agent-infra/
terraform/ β control-plane GKE cluster + CI service account
manifests/ β Overseer + Repo-Agent + AgentSandboxes deployments
templates/ β validated template library (see Templates section below)
GEMINI.md β guardrails and instructions for the Gemini CLI agent
GUIDANCE.md β manual setup steps (identity, Secret Manager)
flowchart LR
DC(["π detect-changes\nDiff PR head SHA or push\nagainst base β outputs\nchanged template list"])
subgraph pr ["βββ Pull Request ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ"]
direction LR
L(["π lint\nper template\ntf fmt Β· tf validate\nhelm lint Β· KCC YAML\nboth-paths check"])
DTTF(["ποΈ deploy-test-tf\nper template\nTF apply β verify β Helm deploy\nβ security scan β TF destroy\nPost PR summary comment"])
DTKCC(["βΈοΈ deploy-test-kcc\nper template\nKCC apply β wait Ready\nβ delete\nPost PR summary comment"])
end
subgraph push ["βββ Push to main ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ"]
direction LR
VTH(["ποΈ validate-tf-helm\nper template Β· 120 min timeout\nTF apply β verify cluster\nRUNNING β TF destroy\nSaves results artifact"])
VKCC(["βΈοΈ validate-kcc\nper template Β· 120 min timeout\nFresh runner β KCC cluster\ncreds from the start\nKCC apply β wait Ready\nβ delete\nSaves results artifact"])
PUB(["π publish-validated\nper template\nAccumulate agent metrics\nUpdate README + .validated\n.agent-metrics-cumulative\ngit push [skip ci]"])
end
DC --> L
L --> DTTF
L --> DTKCC
DC --> VTH
DC --> VKCC
VTH --> PUB
VKCC --> PUB
deploy-test-tfanddeploy-test-kccrun in parallel on separate runners after lint passes. Likewise,validate-tf-helmandvalidate-kccboth run in parallel on push to main β GCP resource name collisions are avoided by the-tf/-kccsuffix convention on all resource names.publish-validatedwaits for both validate jobs to complete before updating the README and.validatedmarker.
sequenceDiagram
actor Dev as Developer / Overseer
participant GH as GitHub
participant CI as GitHub Actions
participant GCP as GCP Sandbox
Dev->>GH: Open issue (template request)
GH->>CI: Overseer creates branch + triggers AgentSandbox
CI->>GH: Agent commits terraform-helm/ + config-connector/
Note over GH,CI: ββ Pull Request CI Gate ββββββββββββββββββββββββββββββ
GH->>CI: PR opened β detect-changes (PR head SHA diff)
CI->>CI: lint: tf fmt/validate Β· helm lint Β· KCC YAML Β· both-paths check
par deploy-test-tf (parallel)
CI->>GCP: TF apply (VPC Β· cluster Β· node pool)
GCP-->>CI: cluster RUNNING β
CI->>GCP: helm upgrade --install
GCP-->>CI: workload ready β
CI->>GCP: TF destroy
and deploy-test-kcc (parallel)
CI->>GCP: KCC apply (Config Connector manifests)
GCP-->>CI: ContainerCluster Ready β
CI->>GCP: kubectl delete (KCC teardown)
end
CI->>GH: Post deploy summary comment to PR (both paths)
Dev->>GH: Review + merge PR
Note over GH,CI: ββ Post-Merge CI Gate (4 independent jobs) ββββββββββ
GH->>CI: push to main β detect-changes
par validate-tf-helm (parallel)
CI->>CI: skip-check (changed since last .validated?)
CI->>GCP: TF apply (VPC Β· cluster Β· node pool Β· Helm workload)
GCP-->>CI: cluster RUNNING β nodes ready β
CI->>GCP: TF destroy (full teardown)
and validate-kcc (parallel)
CI->>CI: skip-check (changed since last .validated?)
CI->>GCP: kubectl apply (KCC manifests)
GCP-->>CI: ContainerCluster Ready β
CI->>GCP: kubectl delete (KCC teardown)
end
CI->>CI: publish-validated starts (waits for both above)
CI->>CI: accumulate .agent-metrics across all sandbox sessions
CI->>GH: Commit README.md + .validated + .agent-metrics-cumulative
templates/<name>/
βββ terraform-helm/ β Terraform + Helm deployment path
β βββ main.tf β VPC Β· cluster Β· workload resources
β βββ variables.tf
β βββ versions.tf β pinned provider versions + GCS backend
β βββ outputs.tf β cluster_name + cluster_location (required by CI)
β βββ workload/ β Helm chart for the workload
β βββ Chart.yaml
β βββ values.yaml
β βββ templates/
βββ config-connector/ β Config Connector (KCC) deployment path
β βββ network.yaml β ComputeNetwork + ComputeSubnetwork
β βββ cluster.yaml β ContainerCluster (+ NodePool if standard)
β βββ workload/ β Kubernetes manifests for the workload (required)
β βββ *.yaml β Deployment Β· Service Β· HPA Β· NetworkPolicy etc.
βββ README.md β auto-updated by CI with validation record
βββ .validated β CI marker: commit + status after successful deploy
βββ .agent-metrics β written by agent sandbox (latest session)
βββ .agent-metrics-cumulative β CI-maintained running total across all sessions
CI enforcement rules:
- Both
terraform-helm/andconfig-connector/must exist (lint fails otherwise) google_container_clustermust havedeletion_protection = false- KCC manifests must not use
cnrm.cloud.google.com/deletion-policy: abandon - Resources must use template-based names (e.g.,
enterprise-gke-vpc) not issue numbers validate-tf-helm/validate-kccre-run whenever the template changes since last.validatedcommit
| Template | TF+Helm | KCC | Validated |
|---|---|---|---|
| basic-gke-hello-world | GKE Standard + hello-world | GKE Standard + hello-world | β |
| enterprise-gke | GKE Standard + security stack + Helm workload | GKE Standard + security stack + KCC workload | β |
| latest-gke-features | GKE Standard + Gateway API + NAP + Native Sidecars | GKE Standard + Native Sidecars + Gateway API | β |
| gke-fqdn-egress-security | GKE Standard + FQDN Network Policies + AI Egress | GKE Standard + KCC Networking | β |
| gke-topology-aware-routing | GKE Standard + Topology-Aware Routing + Gateway API | GKE Standard + Topology-Aware Routing + Gateway API | β |
The forge validates patterns drawn from:
| Source | Focus |
|---|---|
| Cloud Foundation Toolkit | GCP security baselines |
| Cluster Toolkit | HPC + AI/ML clusters |
| Kubernetes Engine Samples | GKE workload patterns |
| Terraform GKE Modules | Reusable TF modules |
| GKE AI Labs | AI/ML on GKE |
| Gemini for Kubernetes Development | Operator stack powering this forge |
| Accelerated Platforms | GPU/TPU workloads |
| GKE Policy Automation | Policy as code |
| LLM-D | LLM inference on GKE |