feat: publish Michelangelo control plane as a Helm chart for open source deployment

## Summary

Today, deploying Michelangelo requires cloning the repo, running \`sandbox.py\`, and understanding its internal sequencing logic. For open source partners who want to run Michelangelo on their own clusters — whether on-prem, EKS, GKE, or AKS — there is no standard, self-contained deployment artifact. A published Helm chart closes this gap.

## Problem

### Current state

\`sandbox.py\` orchestrates the control plane by applying raw YAML files via \`kubectl apply\` in a hardcoded sequence. All service addresses, credentials, and configuration are embedded directly in those YAMLs. Partners who want to deploy Michelangelo must:

- Fork and modify the raw YAML files for their environment
- Understand the internal ordering logic in \`sandbox.py\`
- Manually reconcile changes every time the upstream repo updates
- Write their own upgrade and rollback procedures

This is not a viable path for external operators — it creates a fork-and-diverge problem and blocks community adoption.

### What open source partners need

| Need | Current state | With Helm chart |
|---|---|---|
| Install on any K8s cluster | Clone repo, modify raw YAML | \`helm install michelangelo oci://...\` |
| Configure for their infrastructure | Edit embedded YAML values | \`--set metadataStorage.host=...\` or \`-f values.yaml\` |
| Upgrade to a new release | Re-clone, re-apply, resolve conflicts | \`helm upgrade michelangelo --reuse-values\` |
| Roll back a bad deploy | Manual \`kubectl apply\` of previous YAMLs | \`helm rollback michelangelo\` |
| Disable components they don't need | Comment out YAML blocks | \`--set ui.enabled=false\` |
| Use their own workflow engine (Cadence vs Temporal) | Fork worker config | \`--set workflow.engine=temporal\` |
| Use their own object storage (S3, GCS, MinIO) | Edit embedded endpoint strings | \`--set objectStorage.endpoint=...\` |
| GitOps / ArgoCD / Flux integration | Not supported | Native via \`helm template\` or OCI chart |

## Proposed Solution

Publish a single \`michelangelo\` Helm chart that installs the full control plane — apiserver, envoy, UI, worker, controllermgr, CRDs, and RBAC — against any Kubernetes cluster.

### Chart boundary

The chart owns only the control plane. It assumes infrastructure already exists and accepts connection values pointing at it — the same boundary KubeRay and Temporal draw.

**In scope (chart installs):**
- \`michelangelo-apiserver\` (gRPC API server)
- \`michelangelo-envoy\` (gRPC-Web proxy)
- \`michelangelo-ui\` (React frontend)
- \`michelangelo-worker\` (Cadence/Temporal workflow client)
- \`michelangelo-controllermgr\` (Kubernetes controller manager)
- All CRDs and RBAC

**Out of scope (partner brings their own):**
- MySQL / PostgreSQL
- S3 / GCS / MinIO
- Cadence / Temporal
- KubeRay, Spark Operator

### Install experience

```bash
# Production install
helm install michelangelo oci://ghcr.io/michelangelo-ai/charts/michelangelo \
  --set metadataStorage.host=my-rds.example.com \
  --set objectStorage.endpoint=s3.amazonaws.com \
  --set workflow.endpoint=temporal-frontend:7233 \
  --set workflow.engine=temporal

# Local k3d (sandbox)
helm install michelangelo ./helm/michelangelo -f helm/michelangelo/values-k3d.yaml

# API-only (no UI)
helm install michelangelo ./helm/michelangelo \
  --set ui.enabled=false \
  --set envoy.enabled=false
```

### Key values interface

```yaml
metadataStorage:
  driver: mysql          # "mysql" or "postgres"
  host: ""
  port: 3306
  database: michelangelo
  rootPassword: ""

objectStorage:
  endpoint: ""           # minio:9000 / s3.amazonaws.com / storage.googleapis.com
  secure: false

workflow:
  endpoint: ""           # cadence:7933 / temporal-frontend:7233
  engine: cadence        # "cadence" or "temporal"

# Per-component enable/disable (follows Temporal Helm chart pattern)
apiserver:    { enabled: true }
envoy:        { enabled: true }
ui:           { enabled: true }
worker:       { enabled: true }
controllermgr:{ enabled: true }
```

### Benefits over raw YAML

| Benefit | Detail |
|---|---|
| **Single install command** | No Python, no clone-and-modify workflow |
| **Environment parity** | Same chart, same values interface for local k3d, staging, and production |
| **Standard upgrade path** | \`helm upgrade --reuse-values\` handles rolling restarts and config diffs |
| **GitOps ready** | OCI chart + \`helm template\` output works with ArgoCD, Flux, and any GitOps tool |
| **Component toggles** | Disable UI, envoy, or worker independently without touching templates |
| **Workflow engine portability** | Cadence and Temporal switchable via a single value |
| **Credential safety** | \`helm.sh/resource-policy: keep\` prevents upgrade from rotating externally-injected credentials |
| **Schema init container** | Apiserver waits for DB schema readiness instead of relying on Python ordering |

## Implementation Plan

Tracked in the internal design doc. High-level phases:

- **Phase 1:** CI gate — \`helm lint\` + \`helm template --debug\` on every PR touching \`helm/\`
- **Phase 2:** Create \`helm/michelangelo/\` chart; migrate all 5 control plane services from raw YAML to Helm templates; validate local k3d install
- **Phase 3:** Confirm observability/experimental tier stays in \`sandbox.py\`; document chart boundary
- **Phase 4:** Update \`sandbox.py\` to call \`helm install / upgrade / uninstall\` internally; publish chart to GHCR OCI registry

## Open Questions for Community Input

We'd welcome feedback from anyone planning to deploy Michelangelo externally:

1. **PostgreSQL support** — is MySQL-only a blocker, or do you need Postgres from day one?
2. **Ingress** — should the chart include an optional Ingress/Gateway resource, or do you prefer to manage that externally?
3. **Multi-namespace** — do you need the controller manager to watch multiple namespaces, or is single-namespace sufficient?
4. **Secrets management** — do you use external secrets operators (ESO, Vault Agent)? Should the chart support \`secretRef\` for credentials rather than inline values?
5. **OCI registry** — is \`ghcr.io\` the right registry, or would you prefer an \`index.yaml\`-style Helm repo?

Please comment below with your environment constraints — it will directly shape the values interface before Phase 2 lands.

Benefit	Detail
Single install command	No Python, no clone-and-modify workflow
Environment parity	Same chart, same values interface for local k3d, staging, and production
Standard upgrade path	`helm upgrade --reuse-values` handles rolling restarts and config diffs
GitOps ready	OCI chart + `helm template` output works with ArgoCD, Flux, and any GitOps tool
Component toggles	Disable UI, envoy, or worker independently without touching templates
Workflow engine portability	Cadence and Temporal switchable via a single value
Credential safety	`helm.sh/resource-policy: keep` prevents upgrade from rotating externally-injected credentials
Schema init container	Apiserver waits for DB schema readiness instead of relying on Python ordering

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: publish Michelangelo control plane as a Helm chart for open source deployment #1136

Summary

Problem

Current state

What open source partners need

Proposed Solution

Chart boundary

Install experience

Key values interface

Benefits over raw YAML

Implementation Plan

Open Questions for Community Input

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Need	Current state	With Helm chart
Install on any K8s cluster	Clone repo, modify raw YAML	`helm install michelangelo oci://...`
Configure for their infrastructure	Edit embedded YAML values	`--set metadataStorage.host=...` or `-f values.yaml`
Upgrade to a new release	Re-clone, re-apply, resolve conflicts	`helm upgrade michelangelo --reuse-values`
Roll back a bad deploy	Manual `kubectl apply` of previous YAMLs	`helm rollback michelangelo`
Disable components they don't need	Comment out YAML blocks	`--set ui.enabled=false`
Use their own workflow engine (Cadence vs Temporal)	Fork worker config	`--set workflow.engine=temporal`
Use their own object storage (S3, GCS, MinIO)	Edit embedded endpoint strings	`--set objectStorage.endpoint=...`
GitOps / ArgoCD / Flux integration	Not supported	Native via `helm template` or OCI chart

feat: publish Michelangelo control plane as a Helm chart for open source deployment #1136

Description

Summary

Problem

Current state

What open source partners need

Proposed Solution

Chart boundary

Install experience

Key values interface

Benefits over raw YAML

Implementation Plan

Open Questions for Community Input

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions