You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Remove MLFLOW_TRACKING_URI from michelangelo-config ConfigMap — tracking
URI is user-space config, not a system-level operator concern
- Step 2 now shows two user-owned approaches: set_tracking_uri() in code,
or pass via --env at pipeline submission time
- Auth (Step 3) updated to use --env flags instead of ConfigMap patches
- Verification replaced with temporary curl pod — kubectl exec into task
pods is not practical
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: docs/operator-guides/integrations/mlflow.md
+46-64Lines changed: 46 additions & 64 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,26 +12,25 @@ Michelangelo does not bundle an MLflow server. This guide assumes you are runnin
12
12
┌─────────────────────────────────────────────┐
13
13
│ Operator Responsibility │
14
14
│ ├─ Deploy or point to an MLflow server │
15
-
│ ├─ Ensure network reachability from pods │
16
-
│ └─ Inject MLFLOW_TRACKING_URI via ConfigMap │
15
+
│ └─ Ensure network reachability from pods │
17
16
└─────────────────────────────────────────────┘
18
17
↓
19
18
┌─────────────────────────────────────────────┐
20
19
│ User Responsibility (task code) │
20
+
│ ├─ Set MLFLOW_TRACKING_URI in workflow code │
21
21
│ ├─ Import mlflow inside @uniflow.task() │
22
-
│ ├─ Read URI from environment variable │
23
22
│ └─ Log runs, params, metrics, artifacts │
24
23
└─────────────────────────────────────────────┘
25
24
```
26
25
27
-
Michelangelo does not intercept or wrap MLflow calls. Users call the MLflow client directly inside `@uniflow.task()` functions; Michelangelo provides the environment variable injection and network access.
26
+
Michelangelo does not intercept or wrap MLflow calls. Users call the MLflow client directly inside `@uniflow.task()` functions and configure the tracking URI themselves. The operator's job is to ensure the MLflow server is reachable from task pods.
28
27
29
28
---
30
29
31
30
## Prerequisites
32
31
33
32
- A running MLflow Tracking Server accessible from your Kubernetes cluster. Replace `http://mlflow.example.com:5000` in the examples below with your actual server address.
34
-
- Sufficient RBAC to create ConfigMaps and patch namespace-scoped resources in the compute cluster namespace.
33
+
- Sufficient RBAC to create NetworkPolicy resources in the compute cluster namespace if egress rules are needed.
35
34
- The `mlflow` Python package available in the task's Docker image (users add this to their `requirements.txt`).
36
35
37
36
---
@@ -85,87 +84,73 @@ Replace `<your-pod-selector-label>` with labels that match your task pods. Check
85
84
86
85
---
87
86
88
-
## Step 2: Inject the Tracking URI into the ConfigMap
87
+
## Step 2: Configure the Tracking URI
89
88
90
-
Michelangelo injects the `michelangelo-config` ConfigMap as an `envFrom` source into every task pod. Adding a key here makes it available as an environment variable in all Ray and Spark pods dispatched by Michelangelo.
89
+
`MLFLOW_TRACKING_URI`is a user-space configuration — it belongs in workflow code or the Ray job pod environment, not in the Michelangelo system ConfigMap. Users should set it themselves using one of these approaches.
New pods pick up the change automatically. Already-running pods will not see the update until they are replaced.
107
+
### Option B: Set via pipeline environment
119
108
120
-
:::tip
121
-
`MLFLOW_TRACKING_URI`is the environment variable that MLflow's Python client reads natively — no extra configuration is needed in user task code.
122
-
:::
109
+
Users can pass `MLFLOW_TRACKING_URI` as an environment variable when submitting a pipeline run, keeping the URI out of source code:
110
+
111
+
```bash
112
+
ma pipeline dev-run -f pipeline.yaml --env MLFLOW_TRACKING_URI=http://mlflow.example.com:5000
113
+
```
114
+
115
+
In task code, MLflow reads `MLFLOW_TRACKING_URI` from the environment automatically — no explicit `set_tracking_uri()` call is needed when the variable is set.
123
116
124
117
---
125
118
126
119
## Step 3: Handle Authentication
127
120
128
121
### Self-hosted MLflow with basic auth
129
122
130
-
If your MLflow server requires HTTP basic authentication, add the credentials to the ConfigMap:
123
+
If your MLflow server requires HTTP basic authentication, pass the credentials as pipeline environment variables:
MLflow's client reads `MLFLOW_TRACKING_USERNAME` and `MLFLOW_TRACKING_PASSWORD` natively.
144
133
145
134
:::warning
146
-
`michelangelo-config`is a ConfigMap, not a Secret — values are stored in plaintext in etcd. For production environments, consider using [workload identity](https://kubernetes.io/docs/concepts/security/service-accounts/) (IRSA on AWS, Workload Identity on GKE) so that task pods authenticate to MLflow via IAM roles rather than static credentials.
135
+
Avoid hardcoding credentials in source code or pipeline YAML files committed to version control. Pass them at runtime via `--env` or a secrets manager integrated with your CI/CD system.
147
136
:::
148
137
149
138
### Databricks Managed MLflow
150
139
151
-
If you are using Databricks Managed MLflow, set the following keys:
140
+
If you are using Databricks Managed MLflow, pass the following environment variables at pipeline submission time:
Once the operator has completed the steps above, users can use MLflow from any `@uniflow.task()` function without any extra configuration — the MLflow client reads `MLFLOW_TRACKING_URI` from the environment automatically.
153
+
Once the operator has confirmed network reachability (Step 1), users configure their MLflow tracking URI and log experiments from any `@uniflow.task()` function.
169
154
170
155
```python
171
156
import mlflow
@@ -216,30 +201,27 @@ MLflow includes its own model registry. Michelangelo also has a built-in model r
216
201
217
202
## Verification
218
203
219
-
After applying the configuration, confirm the environment variable is visible inside a task pod:
A `200 OK` response confirms both the environment variable injection and network reachability are working correctly.
215
+
A `200 OK` response confirms task pods in that namespace can reach the MLflow server. The pod is automatically deleted after the check (`--rm`).
233
216
234
217
---
235
218
236
219
## Troubleshooting
237
220
238
221
| Symptom | Likely cause | Resolution |
239
222
|---|---|---|
240
-
| `MLFLOW_TRACKING_URI` not set in pod | ConfigMap patch not applied, or pod predates the patch | Verify with `kubectl get configmap michelangelo-config -n <compute-namespace> -o yaml`; restart pods if needed |
241
223
| `ConnectionRefusedError` or `requests.exceptions.ConnectionError` | MLflow server unreachable from pod | Re-run the connectivity test from Step 1; check NetworkPolicy and firewall rules |
242
-
| `RestException: PERMISSION_DENIED` | Credentials missing or incorrect | Verify `MLFLOW_TRACKING_USERNAME` / `MLFLOW_TRACKING_PASSWORD` are set; check MLflow server auth config |
224
+
| `RestException: PERMISSION_DENIED` | Credentials missing or incorrect | Verify `MLFLOW_TRACKING_USERNAME` / `MLFLOW_TRACKING_PASSWORD` are set at pipeline submission time |
243
225
| `mlflow: command not found` / `ModuleNotFoundError` | `mlflow` not in task's Docker image | Add `mlflow` to `requirements.txt` or the project Dockerfile |
244
226
| MLflow run logged but artifacts missing | Artifact store (S3/GCS) unreachable from pod | Confirm task pod has access to the artifact store configured in the MLflow server |
245
227
| `INVALID_PARAMETER_VALUE` on `log_model` | Client/server version mismatch | Pin `mlflow` to the same major version as the server |
0 commit comments