|
| 1 | +# Authentication & Identity |
| 2 | + |
| 3 | +Authentication in Michelangelo operates at two levels: |
| 4 | + |
| 5 | +- **User authentication** — end users authenticate to the Michelangelo API and UI via an identity provider (IdP) |
| 6 | +- **Service authentication** — internal services (worker, controller manager) authenticate to each other using Kubernetes service account tokens |
| 7 | + |
| 8 | +This guide covers configuring both, plus RBAC authorization and multi-tenant isolation. |
| 9 | + |
| 10 | +## Enabling RBAC |
| 11 | + |
| 12 | +RBAC is disabled by default. Enable it in the API server ConfigMap overlay before connecting an identity provider: |
| 13 | + |
| 14 | +```yaml |
| 15 | +apiserver: |
| 16 | + auth: |
| 17 | + rbacEnabled: true |
| 18 | +``` |
| 19 | +
|
| 20 | +Apply the overlay and restart the API server: |
| 21 | +
|
| 22 | +```bash |
| 23 | +kubectl rollout restart deployment/michelangelo-apiserver -n ma-system |
| 24 | +``` |
| 25 | + |
| 26 | +Once RBAC is enabled, users without a RoleBinding will be denied access to all resources. |
| 27 | + |
| 28 | +## Connecting an Identity Provider (OIDC) |
| 29 | + |
| 30 | +Michelangelo supports any OIDC-compliant identity provider. Configure it in the API server ConfigMap: |
| 31 | + |
| 32 | +```yaml |
| 33 | +apiserver: |
| 34 | + auth: |
| 35 | + rbacEnabled: true |
| 36 | + oidc: |
| 37 | + issuerUrl: https://accounts.your-idp.com |
| 38 | + clientId: michelangelo |
| 39 | + usernameClaim: email # JWT claim used as the Michelangelo username |
| 40 | + groupsClaim: groups # JWT claim used for group-based RBAC |
| 41 | +``` |
| 42 | +
|
| 43 | +### Okta |
| 44 | +
|
| 45 | +1. In the Okta admin console, create an application of type **Web** |
| 46 | +2. Set the **Sign-in redirect URI** to `https://michelangelo-envoy.your-domain/callback` |
| 47 | +3. Copy the **Client ID** and **Okta domain** into the config: |
| 48 | + ```yaml |
| 49 | + oidc: |
| 50 | + issuerUrl: https://your-org.okta.com |
| 51 | + clientId: <client-id-from-okta> |
| 52 | + ``` |
| 53 | + |
| 54 | +### Google Workspace |
| 55 | + |
| 56 | +1. In Google Cloud Console, create an **OAuth 2.0 Client ID** of type Web application |
| 57 | +2. Add your Michelangelo Envoy URL as an authorized redirect URI |
| 58 | +3. Set the issuer URL: |
| 59 | + ```yaml |
| 60 | + oidc: |
| 61 | + issuerUrl: https://accounts.google.com |
| 62 | + clientId: <client-id>.apps.googleusercontent.com |
| 63 | + usernameClaim: email |
| 64 | + groupsClaim: hd # Google Workspace hosted domain |
| 65 | + ``` |
| 66 | + |
| 67 | +### Azure Active Directory |
| 68 | + |
| 69 | +1. Register a new application in the Azure portal |
| 70 | +2. Set the redirect URI to your Michelangelo Envoy callback URL |
| 71 | +3. Note the **Application (client) ID** and **Directory (tenant) ID**: |
| 72 | + ```yaml |
| 73 | + oidc: |
| 74 | + issuerUrl: https://login.microsoftonline.com/<tenant-id>/v2.0 |
| 75 | + clientId: <application-client-id> |
| 76 | + usernameClaim: upn # User Principal Name (email format) |
| 77 | + groupsClaim: groups |
| 78 | + ``` |
| 79 | + |
| 80 | +### Keycloak |
| 81 | + |
| 82 | +1. Create a realm and a Client with Client Protocol `openid-connect` |
| 83 | +2. Set the redirect URI and note the client ID: |
| 84 | + ```yaml |
| 85 | + oidc: |
| 86 | + issuerUrl: https://keycloak.your-domain.com/realms/<realm-name> |
| 87 | + clientId: michelangelo |
| 88 | + ``` |
| 89 | + |
| 90 | +## Session Token Configuration |
| 91 | + |
| 92 | +Control how long a user's session remains valid: |
| 93 | + |
| 94 | +```yaml |
| 95 | +apiserver: |
| 96 | + auth: |
| 97 | + sessionTokenExpiry: 8h # Valid time units: h, m, s |
| 98 | +``` |
| 99 | + |
| 100 | +8 hours is a reasonable default for a standard workday. Shorter expiry increases security but requires more frequent re-authentication. |
| 101 | + |
| 102 | +## Multi-Factor Authentication |
| 103 | + |
| 104 | +MFA is enforced at the IdP level, not within Michelangelo. Configure MFA policies in your identity provider's admin console. Michelangelo requires users to complete the full IdP authentication flow — including MFA — before issuing a session token. |
| 105 | + |
| 106 | +## Granting Access with RBAC |
| 107 | + |
| 108 | +After RBAC is enabled, users need a `RoleBinding` or `ClusterRoleBinding` to access Michelangelo resources. |
| 109 | + |
| 110 | +### Grant a user read access to a project namespace |
| 111 | + |
| 112 | +```yaml |
| 113 | +apiVersion: rbac.authorization.k8s.io/v1 |
| 114 | +kind: RoleBinding |
| 115 | +metadata: |
| 116 | + name: alice-reader |
| 117 | + namespace: ml-team-project |
| 118 | +subjects: |
| 119 | +- kind: User |
| 120 | + name: alice@your-company.com # Must match the value of usernameClaim in the JWT |
| 121 | + apiGroup: rbac.authorization.k8s.io |
| 122 | +roleRef: |
| 123 | + kind: ClusterRole |
| 124 | + name: viewer |
| 125 | + apiGroup: rbac.authorization.k8s.io |
| 126 | +``` |
| 127 | + |
| 128 | +### Grant a team admin access via group membership |
| 129 | + |
| 130 | +```yaml |
| 131 | +apiVersion: rbac.authorization.k8s.io/v1 |
| 132 | +kind: RoleBinding |
| 133 | +metadata: |
| 134 | + name: ml-team-admins |
| 135 | + namespace: ml-team-project |
| 136 | +subjects: |
| 137 | +- kind: Group |
| 138 | + name: ml-team # Must match the value of groupsClaim in the JWT |
| 139 | + apiGroup: rbac.authorization.k8s.io |
| 140 | +roleRef: |
| 141 | + kind: ClusterRole |
| 142 | + name: editor |
| 143 | + apiGroup: rbac.authorization.k8s.io |
| 144 | +``` |
| 145 | + |
| 146 | +Use `RoleBinding` to scope access to a specific namespace. Use `ClusterRoleBinding` only for platform administrators who need cross-namespace access. |
| 147 | + |
| 148 | +## Multi-Tenant Namespace Isolation |
| 149 | + |
| 150 | +Each team or project should have its own Kubernetes namespace. Use `NetworkPolicy` resources to prevent cross-namespace access to ML workloads: |
| 151 | + |
| 152 | +```yaml |
| 153 | +apiVersion: networking.k8s.io/v1 |
| 154 | +kind: NetworkPolicy |
| 155 | +metadata: |
| 156 | + name: deny-cross-namespace |
| 157 | + namespace: ml-team-a |
| 158 | +spec: |
| 159 | + podSelector: {} |
| 160 | + policyTypes: |
| 161 | + - Ingress |
| 162 | + ingress: |
| 163 | + - from: |
| 164 | + - namespaceSelector: |
| 165 | + matchLabels: |
| 166 | + kubernetes.io/metadata.name: ml-team-a |
| 167 | + - namespaceSelector: |
| 168 | + matchLabels: |
| 169 | + kubernetes.io/metadata.name: ma-system # Control plane needs access |
| 170 | +``` |
| 171 | + |
| 172 | +This allows traffic within the team's namespace and from the Michelangelo control plane, but blocks all other namespaces. |
| 173 | + |
| 174 | +## Service Authentication (Internal) |
| 175 | + |
| 176 | +Michelangelo services authenticate to each other using Kubernetes service account tokens. |
| 177 | + |
| 178 | +**Worker → API server**: Configured via `worker.useTLS: true` in the worker ConfigMap. The worker uses its Kubernetes pod service account token. Do not set `useTLS: false` in production. |
| 179 | + |
| 180 | +```yaml |
| 181 | +worker: |
| 182 | + address: michelangelo-apiserver.ma-system.svc.cluster.local:15566 |
| 183 | + maApiServiceName: ma-apiserver |
| 184 | + useTLS: true |
| 185 | +``` |
| 186 | + |
| 187 | +**Controller manager → compute cluster**: Uses the `ray-manager` service account token stored as a Secret in the control plane namespace. See [Register a Compute Cluster](jobs/register-a-compute-cluster-to-michelangelo-control-plane.md) for the full setup including token rotation guidance. |
| 188 | + |
| 189 | +## Disabling Direct Storage Access |
| 190 | + |
| 191 | +Do not allow users or services to directly access etcd or object storage (S3/MinIO) in ways that bypass the Michelangelo API. For S3 access: |
| 192 | + |
| 193 | +- Set `useIam: true` in the controller manager ConfigMap — this uses IAM roles attached to pods via ServiceAccount annotations, not hardcoded credentials |
| 194 | +- Do not grant `s3:*` to individual users; use IAM policies scoped to specific buckets and prefixes |
| 195 | +- Audit S3 bucket policies regularly to ensure no public or cross-account access is inadvertently granted |
0 commit comments