Skip to content

feat: publish Michelangelo control plane as a Helm chart for open source deployment #1136

@sallycr

Description

@sallycr

Summary

Today, deploying Michelangelo requires cloning the repo, running `sandbox.py`, and understanding its internal sequencing logic. For open source partners who want to run Michelangelo on their own clusters — whether on-prem, EKS, GKE, or AKS — there is no standard, self-contained deployment artifact. A published Helm chart closes this gap.

Problem

Current state

`sandbox.py` orchestrates the control plane by applying raw YAML files via `kubectl apply` in a hardcoded sequence. All service addresses, credentials, and configuration are embedded directly in those YAMLs. Partners who want to deploy Michelangelo must:

  • Fork and modify the raw YAML files for their environment
  • Understand the internal ordering logic in `sandbox.py`
  • Manually reconcile changes every time the upstream repo updates
  • Write their own upgrade and rollback procedures

This is not a viable path for external operators — it creates a fork-and-diverge problem and blocks community adoption.

What open source partners need

Need Current state With Helm chart
Install on any K8s cluster Clone repo, modify raw YAML `helm install michelangelo oci://...`
Configure for their infrastructure Edit embedded YAML values `--set metadataStorage.host=...` or `-f values.yaml`
Upgrade to a new release Re-clone, re-apply, resolve conflicts `helm upgrade michelangelo --reuse-values`
Roll back a bad deploy Manual `kubectl apply` of previous YAMLs `helm rollback michelangelo`
Disable components they don't need Comment out YAML blocks `--set ui.enabled=false`
Use their own workflow engine (Cadence vs Temporal) Fork worker config `--set workflow.engine=temporal`
Use their own object storage (S3, GCS, MinIO) Edit embedded endpoint strings `--set objectStorage.endpoint=...`
GitOps / ArgoCD / Flux integration Not supported Native via `helm template` or OCI chart

Proposed Solution

Publish a single `michelangelo` Helm chart that installs the full control plane — apiserver, envoy, UI, worker, controllermgr, CRDs, and RBAC — against any Kubernetes cluster.

Chart boundary

The chart owns only the control plane. It assumes infrastructure already exists and accepts connection values pointing at it — the same boundary KubeRay and Temporal draw.

In scope (chart installs):

  • `michelangelo-apiserver` (gRPC API server)
  • `michelangelo-envoy` (gRPC-Web proxy)
  • `michelangelo-ui` (React frontend)
  • `michelangelo-worker` (Cadence/Temporal workflow client)
  • `michelangelo-controllermgr` (Kubernetes controller manager)
  • All CRDs and RBAC

Out of scope (partner brings their own):

  • MySQL / PostgreSQL
  • S3 / GCS / MinIO
  • Cadence / Temporal
  • KubeRay, Spark Operator

Install experience

# Production install
helm install michelangelo oci://ghcr.io/michelangelo-ai/charts/michelangelo \
  --set metadataStorage.host=my-rds.example.com \
  --set objectStorage.endpoint=s3.amazonaws.com \
  --set workflow.endpoint=temporal-frontend:7233 \
  --set workflow.engine=temporal

# Local k3d (sandbox)
helm install michelangelo ./helm/michelangelo -f helm/michelangelo/values-k3d.yaml

# API-only (no UI)
helm install michelangelo ./helm/michelangelo \
  --set ui.enabled=false \
  --set envoy.enabled=false

Key values interface

metadataStorage:
  driver: mysql          # "mysql" or "postgres"
  host: ""
  port: 3306
  database: michelangelo
  rootPassword: ""

objectStorage:
  endpoint: ""           # minio:9000 / s3.amazonaws.com / storage.googleapis.com
  secure: false

workflow:
  endpoint: ""           # cadence:7933 / temporal-frontend:7233
  engine: cadence        # "cadence" or "temporal"

# Per-component enable/disable (follows Temporal Helm chart pattern)
apiserver:    { enabled: true }
envoy:        { enabled: true }
ui:           { enabled: true }
worker:       { enabled: true }
controllermgr:{ enabled: true }

Benefits over raw YAML

Benefit Detail
Single install command No Python, no clone-and-modify workflow
Environment parity Same chart, same values interface for local k3d, staging, and production
Standard upgrade path `helm upgrade --reuse-values` handles rolling restarts and config diffs
GitOps ready OCI chart + `helm template` output works with ArgoCD, Flux, and any GitOps tool
Component toggles Disable UI, envoy, or worker independently without touching templates
Workflow engine portability Cadence and Temporal switchable via a single value
Credential safety `helm.sh/resource-policy: keep` prevents upgrade from rotating externally-injected credentials
Schema init container Apiserver waits for DB schema readiness instead of relying on Python ordering

Implementation Plan

Tracked in the internal design doc. High-level phases:

  • Phase 1: CI gate — `helm lint` + `helm template --debug` on every PR touching `helm/`
  • Phase 2: Create `helm/michelangelo/` chart; migrate all 5 control plane services from raw YAML to Helm templates; validate local k3d install
  • Phase 3: Confirm observability/experimental tier stays in `sandbox.py`; document chart boundary
  • Phase 4: Update `sandbox.py` to call `helm install / upgrade / uninstall` internally; publish chart to GHCR OCI registry

Open Questions for Community Input

We'd welcome feedback from anyone planning to deploy Michelangelo externally:

  1. PostgreSQL support — is MySQL-only a blocker, or do you need Postgres from day one?
  2. Ingress — should the chart include an optional Ingress/Gateway resource, or do you prefer to manage that externally?
  3. Multi-namespace — do you need the controller manager to watch multiple namespaces, or is single-namespace sufficient?
  4. Secrets management — do you use external secrets operators (ESO, Vault Agent)? Should the chart support `secretRef` for credentials rather than inline values?
  5. OCI registry — is `ghcr.io` the right registry, or would you prefer an `index.yaml`-style Helm repo?

Please comment below with your environment constraints — it will directly shape the values interface before Phase 2 lands.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions