Summary
Today, deploying Michelangelo requires cloning the repo, running `sandbox.py`, and understanding its internal sequencing logic. For open source partners who want to run Michelangelo on their own clusters — whether on-prem, EKS, GKE, or AKS — there is no standard, self-contained deployment artifact. A published Helm chart closes this gap.
Problem
Current state
`sandbox.py` orchestrates the control plane by applying raw YAML files via `kubectl apply` in a hardcoded sequence. All service addresses, credentials, and configuration are embedded directly in those YAMLs. Partners who want to deploy Michelangelo must:
- Fork and modify the raw YAML files for their environment
- Understand the internal ordering logic in `sandbox.py`
- Manually reconcile changes every time the upstream repo updates
- Write their own upgrade and rollback procedures
This is not a viable path for external operators — it creates a fork-and-diverge problem and blocks community adoption.
What open source partners need
| Need |
Current state |
With Helm chart |
| Install on any K8s cluster |
Clone repo, modify raw YAML |
`helm install michelangelo oci://...` |
| Configure for their infrastructure |
Edit embedded YAML values |
`--set metadataStorage.host=...` or `-f values.yaml` |
| Upgrade to a new release |
Re-clone, re-apply, resolve conflicts |
`helm upgrade michelangelo --reuse-values` |
| Roll back a bad deploy |
Manual `kubectl apply` of previous YAMLs |
`helm rollback michelangelo` |
| Disable components they don't need |
Comment out YAML blocks |
`--set ui.enabled=false` |
| Use their own workflow engine (Cadence vs Temporal) |
Fork worker config |
`--set workflow.engine=temporal` |
| Use their own object storage (S3, GCS, MinIO) |
Edit embedded endpoint strings |
`--set objectStorage.endpoint=...` |
| GitOps / ArgoCD / Flux integration |
Not supported |
Native via `helm template` or OCI chart |
Proposed Solution
Publish a single `michelangelo` Helm chart that installs the full control plane — apiserver, envoy, UI, worker, controllermgr, CRDs, and RBAC — against any Kubernetes cluster.
Chart boundary
The chart owns only the control plane. It assumes infrastructure already exists and accepts connection values pointing at it — the same boundary KubeRay and Temporal draw.
In scope (chart installs):
- `michelangelo-apiserver` (gRPC API server)
- `michelangelo-envoy` (gRPC-Web proxy)
- `michelangelo-ui` (React frontend)
- `michelangelo-worker` (Cadence/Temporal workflow client)
- `michelangelo-controllermgr` (Kubernetes controller manager)
- All CRDs and RBAC
Out of scope (partner brings their own):
- MySQL / PostgreSQL
- S3 / GCS / MinIO
- Cadence / Temporal
- KubeRay, Spark Operator
Install experience
# Production install
helm install michelangelo oci://ghcr.io/michelangelo-ai/charts/michelangelo \
--set metadataStorage.host=my-rds.example.com \
--set objectStorage.endpoint=s3.amazonaws.com \
--set workflow.endpoint=temporal-frontend:7233 \
--set workflow.engine=temporal
# Local k3d (sandbox)
helm install michelangelo ./helm/michelangelo -f helm/michelangelo/values-k3d.yaml
# API-only (no UI)
helm install michelangelo ./helm/michelangelo \
--set ui.enabled=false \
--set envoy.enabled=false
Key values interface
metadataStorage:
driver: mysql # "mysql" or "postgres"
host: ""
port: 3306
database: michelangelo
rootPassword: ""
objectStorage:
endpoint: "" # minio:9000 / s3.amazonaws.com / storage.googleapis.com
secure: false
workflow:
endpoint: "" # cadence:7933 / temporal-frontend:7233
engine: cadence # "cadence" or "temporal"
# Per-component enable/disable (follows Temporal Helm chart pattern)
apiserver: { enabled: true }
envoy: { enabled: true }
ui: { enabled: true }
worker: { enabled: true }
controllermgr:{ enabled: true }
Benefits over raw YAML
| Benefit |
Detail |
| Single install command |
No Python, no clone-and-modify workflow |
| Environment parity |
Same chart, same values interface for local k3d, staging, and production |
| Standard upgrade path |
`helm upgrade --reuse-values` handles rolling restarts and config diffs |
| GitOps ready |
OCI chart + `helm template` output works with ArgoCD, Flux, and any GitOps tool |
| Component toggles |
Disable UI, envoy, or worker independently without touching templates |
| Workflow engine portability |
Cadence and Temporal switchable via a single value |
| Credential safety |
`helm.sh/resource-policy: keep` prevents upgrade from rotating externally-injected credentials |
| Schema init container |
Apiserver waits for DB schema readiness instead of relying on Python ordering |
Implementation Plan
Tracked in the internal design doc. High-level phases:
- Phase 1: CI gate — `helm lint` + `helm template --debug` on every PR touching `helm/`
- Phase 2: Create `helm/michelangelo/` chart; migrate all 5 control plane services from raw YAML to Helm templates; validate local k3d install
- Phase 3: Confirm observability/experimental tier stays in `sandbox.py`; document chart boundary
- Phase 4: Update `sandbox.py` to call `helm install / upgrade / uninstall` internally; publish chart to GHCR OCI registry
Open Questions for Community Input
We'd welcome feedback from anyone planning to deploy Michelangelo externally:
- PostgreSQL support — is MySQL-only a blocker, or do you need Postgres from day one?
- Ingress — should the chart include an optional Ingress/Gateway resource, or do you prefer to manage that externally?
- Multi-namespace — do you need the controller manager to watch multiple namespaces, or is single-namespace sufficient?
- Secrets management — do you use external secrets operators (ESO, Vault Agent)? Should the chart support `secretRef` for credentials rather than inline values?
- OCI registry — is `ghcr.io` the right registry, or would you prefer an `index.yaml`-style Helm repo?
Please comment below with your environment constraints — it will directly shape the values interface before Phase 2 lands.
Summary
Today, deploying Michelangelo requires cloning the repo, running `sandbox.py`, and understanding its internal sequencing logic. For open source partners who want to run Michelangelo on their own clusters — whether on-prem, EKS, GKE, or AKS — there is no standard, self-contained deployment artifact. A published Helm chart closes this gap.
Problem
Current state
`sandbox.py` orchestrates the control plane by applying raw YAML files via `kubectl apply` in a hardcoded sequence. All service addresses, credentials, and configuration are embedded directly in those YAMLs. Partners who want to deploy Michelangelo must:
This is not a viable path for external operators — it creates a fork-and-diverge problem and blocks community adoption.
What open source partners need
Proposed Solution
Publish a single `michelangelo` Helm chart that installs the full control plane — apiserver, envoy, UI, worker, controllermgr, CRDs, and RBAC — against any Kubernetes cluster.
Chart boundary
The chart owns only the control plane. It assumes infrastructure already exists and accepts connection values pointing at it — the same boundary KubeRay and Temporal draw.
In scope (chart installs):
Out of scope (partner brings their own):
Install experience
Key values interface
Benefits over raw YAML
Implementation Plan
Tracked in the internal design doc. High-level phases:
Open Questions for Community Input
We'd welcome feedback from anyone planning to deploy Michelangelo externally:
Please comment below with your environment constraints — it will directly shape the values interface before Phase 2 lands.