Skip to content

codebot-sfle/gcp-template-forge-1

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

448 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

GCP Template Forge

An AI-driven pipeline that designs, deploys, and validates production-ready GKE reference architectures β€” dual-path (Terraform + Helm and Config Connector) β€” with every merge.

Objectives

  1. Design β€” Use an AI agent (Gemini CLI + Claude) to author complete, enterprise-grade IaC templates from Google Cloud reference architectures, covering both Terraform/Helm and Config Connector deployment paths.
  2. Deploy & Test β€” Run every template through a full apply β†’ verify β†’ destroy cycle in a real GCP sandbox project before any PR is merged, and again after merge to confirm the published artifact works end-to-end.
  3. Consolidate β€” Act as a living, continuously-validated library of GKE patterns drawn from Google Cloud's public reference repositories, so teams can adopt them with confidence.

System Architecture

The forge is powered by the operator stack from gke-labs/gemini-for-kubernetes-development, running on a GKE Standard control-plane cluster.

flowchart LR
    DEV["πŸ‘€ Developer\nopens issue"]

    subgraph OPS ["gke-labs Operator Stack"]
        OV["🎯 Overseer + Repo-Agent\ncreates branch & PR"]
        AG["πŸ€– Agent Sandbox\nauthors Terraform + KCC templates"]
        OV --> AG
    end

    subgraph GH ["GitHub β€” gcp-template-forge"]
        PR["πŸ”€ Pull Request\nlint Β· deploy-test-tf βˆ₯ deploy-test-kcc"]
        MAIN["βœ… main\nvalidated template library"]
        PR -->|approved + merged| MAIN
    end

    subgraph GCP ["GCP Sandbox"]
        TF["πŸ—οΈ TF + Helm\ncluster"]
        KCC["☸️ Config Connector\ncluster"]
    end

    DEV --> OV
    AG -->|commits templates| PR
    PR -->|deploy-test-tf| TF
    PR -->|deploy-test-kcc| KCC
    MAIN -->|validate-tf-helm βˆ₯ validate-kcc\nthen publish-validated| TF
    MAIN -->|validate-tf-helm βˆ₯ validate-kcc\nthen publish-validated| KCC
Loading

Key Components

Component Role Repo
Overseer Kubernetes operator that watches GitHub for new issues, coordinates the agent lifecycle, and manages PR state gke-labs/gemini-for-kubernetes-development
Repo-Agent Creates GitHub issues, branches, and PRs; posts status comments; triggers the agent sandbox same
AgentSandboxes Kubernetes Jobs that spin up an isolated Gemini CLI session per template; the agent authors all IaC files and commits them same
CI Service Account GCP service account used by GitHub Actions CI to authenticate and run Terraform/Helm/KCC against the sandbox project agent-infra/

Repository Layout

.github/
  workflows/
    sandbox-validation-*.yml  ← lint Β· deploy-test-tf βˆ₯ deploy-test-kcc (PR) Β· validate-tf-helm βˆ₯ validate-kcc Β· publish-validated (push)
  ISSUE_TEMPLATE/           ← template request form
agent-infra/
  terraform/                ← control-plane GKE cluster + CI service account
  manifests/                ← Overseer + Repo-Agent + AgentSandboxes deployments
templates/                  ← validated template library (see Templates section below)
GEMINI.md                   ← guardrails and instructions for the Gemini CLI agent
GUIDANCE.md                 ← manual setup steps (identity, Secret Manager)

CI Pipeline

Job dependency graph

flowchart LR
    DC(["πŸ” detect-changes\nDiff PR head SHA or push\nagainst base β€” outputs\nchanged template list"])

    subgraph pr ["━━━  Pull Request  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"]
        direction LR
        L(["πŸ”Ž lint\nper template\ntf fmt Β· tf validate\nhelm lint Β· KCC YAML\nboth-paths check"])
        DTTF(["πŸ—οΈ deploy-test-tf\nper template\nTF apply β†’ verify β†’ Helm deploy\nβ†’ security scan β†’ TF destroy\nPost PR summary comment"])
        DTKCC(["☸️ deploy-test-kcc\nper template\nKCC apply β†’ wait Ready\nβ†’ delete\nPost PR summary comment"])
    end

    subgraph push ["━━━  Push to main  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"]
        direction LR
        VTH(["πŸ—οΈ validate-tf-helm\nper template Β· 120 min timeout\nTF apply β†’ verify cluster\nRUNNING β†’ TF destroy\nSaves results artifact"])
        VKCC(["☸️ validate-kcc\nper template Β· 120 min timeout\nFresh runner β€” KCC cluster\ncreds from the start\nKCC apply β†’ wait Ready\nβ†’ delete\nSaves results artifact"])
        PUB(["πŸ“‹ publish-validated\nper template\nAccumulate agent metrics\nUpdate README + .validated\n.agent-metrics-cumulative\ngit push [skip ci]"])
    end

    DC --> L
    L --> DTTF
    L --> DTKCC
    DC --> VTH
    DC --> VKCC
    VTH --> PUB
    VKCC --> PUB
Loading

deploy-test-tf and deploy-test-kcc run in parallel on separate runners after lint passes. Likewise, validate-tf-helm and validate-kcc both run in parallel on push to main β€” GCP resource name collisions are avoided by the -tf / -kcc suffix convention on all resource names. publish-validated waits for both validate jobs to complete before updating the README and .validated marker.

End-to-end sequence

sequenceDiagram
    actor Dev as Developer / Overseer
    participant GH as GitHub
    participant CI as GitHub Actions
    participant GCP as GCP Sandbox

    Dev->>GH: Open issue (template request)
    GH->>CI: Overseer creates branch + triggers AgentSandbox
    CI->>GH: Agent commits terraform-helm/ + config-connector/

    Note over GH,CI: ── Pull Request CI Gate ──────────────────────────────
    GH->>CI: PR opened β†’ detect-changes (PR head SHA diff)
    CI->>CI: lint: tf fmt/validate Β· helm lint Β· KCC YAML Β· both-paths check
    par deploy-test-tf (parallel)
        CI->>GCP: TF apply (VPC Β· cluster Β· node pool)
        GCP-->>CI: cluster RUNNING βœ“
        CI->>GCP: helm upgrade --install
        GCP-->>CI: workload ready βœ“
        CI->>GCP: TF destroy
    and deploy-test-kcc (parallel)
        CI->>GCP: KCC apply (Config Connector manifests)
        GCP-->>CI: ContainerCluster Ready βœ“
        CI->>GCP: kubectl delete (KCC teardown)
    end
    CI->>GH: Post deploy summary comment to PR (both paths)
    Dev->>GH: Review + merge PR

    Note over GH,CI: ── Post-Merge CI Gate (4 independent jobs) ──────────
    GH->>CI: push to main β†’ detect-changes
    par validate-tf-helm (parallel)
        CI->>CI: skip-check (changed since last .validated?)
        CI->>GCP: TF apply (VPC Β· cluster Β· node pool Β· Helm workload)
        GCP-->>CI: cluster RUNNING βœ“  nodes ready βœ“
        CI->>GCP: TF destroy (full teardown)
    and validate-kcc (parallel)
        CI->>CI: skip-check (changed since last .validated?)
        CI->>GCP: kubectl apply (KCC manifests)
        GCP-->>CI: ContainerCluster Ready βœ“
        CI->>GCP: kubectl delete (KCC teardown)
    end
    CI->>CI: publish-validated starts (waits for both above)
    CI->>CI: accumulate .agent-metrics across all sandbox sessions
    CI->>GH: Commit README.md + .validated + .agent-metrics-cumulative
Loading

Template structure

templates/<name>/
β”œβ”€β”€ terraform-helm/              ← Terraform + Helm deployment path
β”‚   β”œβ”€β”€ main.tf                  ← VPC Β· cluster Β· workload resources
β”‚   β”œβ”€β”€ variables.tf
β”‚   β”œβ”€β”€ versions.tf              ← pinned provider versions + GCS backend
β”‚   β”œβ”€β”€ outputs.tf               ← cluster_name + cluster_location (required by CI)
β”‚   └── workload/                ← Helm chart for the workload
β”‚       β”œβ”€β”€ Chart.yaml
β”‚       β”œβ”€β”€ values.yaml
β”‚       └── templates/
β”œβ”€β”€ config-connector/            ← Config Connector (KCC) deployment path
β”‚   β”œβ”€β”€ network.yaml             ← ComputeNetwork + ComputeSubnetwork
β”‚   β”œβ”€β”€ cluster.yaml             ← ContainerCluster (+ NodePool if standard)
β”‚   └── workload/                ← Kubernetes manifests for the workload (required)
β”‚       └── *.yaml               ← Deployment Β· Service Β· HPA Β· NetworkPolicy etc.
β”œβ”€β”€ README.md                    ← auto-updated by CI with validation record
β”œβ”€β”€ .validated                   ← CI marker: commit + status after successful deploy
β”œβ”€β”€ .agent-metrics               ← written by agent sandbox (latest session)
└── .agent-metrics-cumulative    ← CI-maintained running total across all sessions

CI enforcement rules:

  • Both terraform-helm/ and config-connector/ must exist (lint fails otherwise)
  • google_container_cluster must have deletion_protection = false
  • KCC manifests must not use cnrm.cloud.google.com/deletion-policy: abandon
  • Resources must use template-based names (e.g., enterprise-gke-vpc) not issue numbers
  • validate-tf-helm / validate-kcc re-run whenever the template changes since last .validated commit

Templates

Template TF+Helm KCC Validated
basic-gke-hello-world GKE Standard + hello-world GKE Standard + hello-world β€”
enterprise-gke GKE Standard + security stack + Helm workload GKE Standard + security stack + KCC workload β€”
latest-gke-features GKE Standard + Gateway API + NAP + Native Sidecars GKE Standard + Native Sidecars + Gateway API β€”
gke-fqdn-egress-security GKE Standard + FQDN Network Policies + AI Egress GKE Standard + KCC Networking β€”
gke-topology-aware-routing GKE Standard + Topology-Aware Routing + Gateway API GKE Standard + Topology-Aware Routing + Gateway API β€”

Public Reference Sources

The forge validates patterns drawn from:

Source Focus
Cloud Foundation Toolkit GCP security baselines
Cluster Toolkit HPC + AI/ML clusters
Kubernetes Engine Samples GKE workload patterns
Terraform GKE Modules Reusable TF modules
GKE AI Labs AI/ML on GKE
Gemini for Kubernetes Development Operator stack powering this forge
Accelerated Platforms GPU/TPU workloads
GKE Policy Automation Policy as code
LLM-D LLM inference on GKE

Test Janitor

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Shell 47.2%
  • HCL 45.8%
  • Go Template 6.0%
  • Dockerfile 1.0%