Skip to content

Commit 8658f91

Browse files
zhoward-1claude
andcommitted
docs: rework MLflow guide per reviewer feedback
- Remove MLFLOW_TRACKING_URI from michelangelo-config ConfigMap — tracking URI is user-space config, not a system-level operator concern - Step 2 now shows two user-owned approaches: set_tracking_uri() in code, or pass via --env at pipeline submission time - Auth (Step 3) updated to use --env flags instead of ConfigMap patches - Verification replaced with temporary curl pod — kubectl exec into task pods is not practical Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent cbecfa2 commit 8658f91

1 file changed

Lines changed: 46 additions & 64 deletions

File tree

docs/operator-guides/integrations/mlflow.md

Lines changed: 46 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -12,26 +12,25 @@ Michelangelo does not bundle an MLflow server. This guide assumes you are runnin
1212
┌─────────────────────────────────────────────┐
1313
│ Operator Responsibility │
1414
│ ├─ Deploy or point to an MLflow server │
15-
│ ├─ Ensure network reachability from pods │
16-
│ └─ Inject MLFLOW_TRACKING_URI via ConfigMap │
15+
│ └─ Ensure network reachability from pods │
1716
└─────────────────────────────────────────────┘
1817
1918
┌─────────────────────────────────────────────┐
2019
│ User Responsibility (task code) │
20+
│ ├─ Set MLFLOW_TRACKING_URI in workflow code │
2121
│ ├─ Import mlflow inside @uniflow.task() │
22-
│ ├─ Read URI from environment variable │
2322
│ └─ Log runs, params, metrics, artifacts │
2423
└─────────────────────────────────────────────┘
2524
```
2625

27-
Michelangelo does not intercept or wrap MLflow calls. Users call the MLflow client directly inside `@uniflow.task()` functions; Michelangelo provides the environment variable injection and network access.
26+
Michelangelo does not intercept or wrap MLflow calls. Users call the MLflow client directly inside `@uniflow.task()` functions and configure the tracking URI themselves. The operator's job is to ensure the MLflow server is reachable from task pods.
2827

2928
---
3029

3130
## Prerequisites
3231

3332
- A running MLflow Tracking Server accessible from your Kubernetes cluster. Replace `http://mlflow.example.com:5000` in the examples below with your actual server address.
34-
- Sufficient RBAC to create ConfigMaps and patch namespace-scoped resources in the compute cluster namespace.
33+
- Sufficient RBAC to create NetworkPolicy resources in the compute cluster namespace if egress rules are needed.
3534
- The `mlflow` Python package available in the task's Docker image (users add this to their `requirements.txt`).
3635

3736
---
@@ -85,87 +84,73 @@ Replace `<your-pod-selector-label>` with labels that match your task pods. Check
8584

8685
---
8786

88-
## Step 2: Inject the Tracking URI into the ConfigMap
87+
## Step 2: Configure the Tracking URI
8988

90-
Michelangelo injects the `michelangelo-config` ConfigMap as an `envFrom` source into every task pod. Adding a key here makes it available as an environment variable in all Ray and Spark pods dispatched by Michelangelo.
89+
`MLFLOW_TRACKING_URI` is a user-space configuration — it belongs in workflow code or the Ray job pod environment, not in the Michelangelo system ConfigMap. Users should set it themselves using one of these approaches.
9190

92-
```bash
93-
kubectl patch configmap michelangelo-config \
94-
--namespace=<compute-namespace> \
95-
--type=merge \
96-
-p '{"data":{"MLFLOW_TRACKING_URI":"http://mlflow.example.com:5000"}}'
97-
```
91+
### Option A: Set in workflow code
9892

99-
Or add it to your existing declarative ConfigMap manifest:
93+
The simplest approach is to call `mlflow.set_tracking_uri()` directly in the task or at the top of the workflow module:
10094

101-
```yaml
102-
apiVersion: v1
103-
kind: ConfigMap
104-
metadata:
105-
name: michelangelo-config
106-
namespace: <compute-namespace>
107-
data:
108-
# Existing keys
109-
MA_FILE_SYSTEM: s3://default
110-
MA_FILE_SYSTEM_S3_SCHEME: http
111-
AWS_ACCESS_KEY_ID: <your-access-key-id>
112-
AWS_SECRET_ACCESS_KEY: <your-secret-access-key>
113-
AWS_ENDPOINT_URL: <your-storage-endpoint>
114-
# MLflow
115-
MLFLOW_TRACKING_URI: "http://mlflow.example.com:5000"
95+
```python
96+
import mlflow
97+
import michelangelo.uniflow.core as uniflow
98+
from michelangelo.uniflow.plugins.ray import RayTask
99+
100+
@uniflow.task(config=RayTask(head_cpu=2, head_memory="4Gi"))
101+
def train_model(train_data, config: dict):
102+
mlflow.set_tracking_uri("http://mlflow.example.com:5000")
103+
mlflow.set_experiment("fraud-detection")
104+
...
116105
```
117106

118-
New pods pick up the change automatically. Already-running pods will not see the update until they are replaced.
107+
### Option B: Set via pipeline environment
119108

120-
:::tip
121-
`MLFLOW_TRACKING_URI` is the environment variable that MLflow's Python client reads natively — no extra configuration is needed in user task code.
122-
:::
109+
Users can pass `MLFLOW_TRACKING_URI` as an environment variable when submitting a pipeline run, keeping the URI out of source code:
110+
111+
```bash
112+
ma pipeline dev-run -f pipeline.yaml --env MLFLOW_TRACKING_URI=http://mlflow.example.com:5000
113+
```
114+
115+
In task code, MLflow reads `MLFLOW_TRACKING_URI` from the environment automatically — no explicit `set_tracking_uri()` call is needed when the variable is set.
123116

124117
---
125118

126119
## Step 3: Handle Authentication
127120

128121
### Self-hosted MLflow with basic auth
129122

130-
If your MLflow server requires HTTP basic authentication, add the credentials to the ConfigMap:
123+
If your MLflow server requires HTTP basic authentication, pass the credentials as pipeline environment variables:
131124

132125
```bash
133-
kubectl patch configmap michelangelo-config \
134-
--namespace=<compute-namespace> \
135-
--type=merge \
136-
-p '{"data":{
137-
"MLFLOW_TRACKING_URI":"http://mlflow.example.com:5000",
138-
"MLFLOW_TRACKING_USERNAME":"<username>",
139-
"MLFLOW_TRACKING_PASSWORD":"<password>"
140-
}}'
126+
ma pipeline dev-run -f pipeline.yaml \
127+
--env MLFLOW_TRACKING_URI=http://mlflow.example.com:5000 \
128+
--env MLFLOW_TRACKING_USERNAME=<username> \
129+
--env MLFLOW_TRACKING_PASSWORD=<password>
141130
```
142131

143132
MLflow's client reads `MLFLOW_TRACKING_USERNAME` and `MLFLOW_TRACKING_PASSWORD` natively.
144133

145134
:::warning
146-
`michelangelo-config` is a ConfigMap, not a Secret — values are stored in plaintext in etcd. For production environments, consider using [workload identity](https://kubernetes.io/docs/concepts/security/service-accounts/) (IRSA on AWS, Workload Identity on GKE) so that task pods authenticate to MLflow via IAM roles rather than static credentials.
135+
Avoid hardcoding credentials in source code or pipeline YAML files committed to version control. Pass them at runtime via `--env` or a secrets manager integrated with your CI/CD system.
147136
:::
148137

149138
### Databricks Managed MLflow
150139

151-
If you are using Databricks Managed MLflow, set the following keys:
140+
If you are using Databricks Managed MLflow, pass the following environment variables at pipeline submission time:
152141

153142
```bash
154-
kubectl patch configmap michelangelo-config \
155-
--namespace=<compute-namespace> \
156-
--type=merge \
157-
-p '{"data":{
158-
"MLFLOW_TRACKING_URI":"databricks",
159-
"DATABRICKS_HOST":"https://<your-workspace>.azuredatabricks.net",
160-
"DATABRICKS_TOKEN":"<your-personal-access-token>"
161-
}}'
143+
ma pipeline dev-run -f pipeline.yaml \
144+
--env MLFLOW_TRACKING_URI=databricks \
145+
--env DATABRICKS_HOST=https://<your-workspace>.azuredatabricks.net \
146+
--env DATABRICKS_TOKEN=<your-personal-access-token>
162147
```
163148

164149
---
165150

166151
## What Users Do (Task Code)
167152

168-
Once the operator has completed the steps above, users can use MLflow from any `@uniflow.task()` function without any extra configuration — the MLflow client reads `MLFLOW_TRACKING_URI` from the environment automatically.
153+
Once the operator has confirmed network reachability (Step 1), users configure their MLflow tracking URI and log experiments from any `@uniflow.task()` function.
169154

170155
```python
171156
import mlflow
@@ -216,30 +201,27 @@ MLflow includes its own model registry. Michelangelo also has a built-in model r
216201

217202
## Verification
218203

219-
After applying the configuration, confirm the environment variable is visible inside a task pod:
220-
221-
```bash
222-
kubectl exec -it <task-pod-name> -n <compute-namespace> -- env | grep MLFLOW
223-
```
224-
225-
You can also verify end-to-end reachability from a task pod by running a connectivity check against the MLflow health endpoint:
204+
Verify network reachability from within the compute namespace using a temporary curl pod — the same approach as Step 1:
226205

227206
```bash
228-
kubectl exec -it <task-pod-name> -n <compute-namespace> -- \
207+
kubectl run mlflow-verify \
208+
--image=curlimages/curl \
209+
--namespace=<compute-namespace> \
210+
--restart=Never \
211+
--rm -it -- \
229212
curl -sv http://mlflow.example.com:5000/health
230213
```
231214

232-
A `200 OK` response confirms both the environment variable injection and network reachability are working correctly.
215+
A `200 OK` response confirms task pods in that namespace can reach the MLflow server. The pod is automatically deleted after the check (`--rm`).
233216

234217
---
235218

236219
## Troubleshooting
237220

238221
| Symptom | Likely cause | Resolution |
239222
|---|---|---|
240-
| `MLFLOW_TRACKING_URI` not set in pod | ConfigMap patch not applied, or pod predates the patch | Verify with `kubectl get configmap michelangelo-config -n <compute-namespace> -o yaml`; restart pods if needed |
241223
| `ConnectionRefusedError` or `requests.exceptions.ConnectionError` | MLflow server unreachable from pod | Re-run the connectivity test from Step 1; check NetworkPolicy and firewall rules |
242-
| `RestException: PERMISSION_DENIED` | Credentials missing or incorrect | Verify `MLFLOW_TRACKING_USERNAME` / `MLFLOW_TRACKING_PASSWORD` are set; check MLflow server auth config |
224+
| `RestException: PERMISSION_DENIED` | Credentials missing or incorrect | Verify `MLFLOW_TRACKING_USERNAME` / `MLFLOW_TRACKING_PASSWORD` are set at pipeline submission time |
243225
| `mlflow: command not found` / `ModuleNotFoundError` | `mlflow` not in task's Docker image | Add `mlflow` to `requirements.txt` or the project Dockerfile |
244226
| MLflow run logged but artifacts missing | Artifact store (S3/GCS) unreachable from pod | Confirm task pod has access to the artifact store configured in the MLflow server |
245227
| `INVALID_PARAMETER_VALUE` on `log_model` | Client/server version mismatch | Pin `mlflow` to the same major version as the server |

0 commit comments

Comments
 (0)