Skip to content

Commit 83d66c1

Browse files
zhoward-1claude
andauthored
fix: correct broken code examples and remove internal references from docs (#1085)
## Summary Fixes 10 broken/incorrect code examples and removes internal Uber references identified during a documentation audit. ### Critical code fixes - **Wrong `DatasetVariable` import** (`michelangelo.sdk.workflow.variables` → `michelangelo.workflow.variables`) — package doesn't exist - **Wrong `DatasetVariable` constructor** (`DatasetVariable(value=...)` → `DatasetVariable.create(...)`) — all usages - **Wrong `LightningTrainer` imports** (`michelangelo.sdk.trainer` + `michelangelo.maf` → `michelangelo.lib.trainer`) — packages don't exist - **`@uniflow.task()` missing required `config=`** — `config: TaskConfig` has no default; bare calls crash at runtime. Added `config=RayTask(...)` and `RayTask` imports to all 6 affected locations - **`@uniflow.workflow` missing parens** → `@uniflow.workflow()` (4 occurrences) ### Major corrections - `apiVersion: michelangelo.uber.com/v2beta1` → `michelangelo.api/v2` in `pipeline-management.md` - Removed nonexistent `ma trigger_run list` command from CLI reference table - `ma pipeline dev_run` → `ma pipeline dev-run` (kebab-case) - Notification YAML updated from numeric enum IDs to proto string names (`NOTIFICATION_TYPE_EMAIL`, `EVENT_TYPE_PIPELINE_RUN_STATE_FAILED`, `RESOURCE_TYPE_TRIGGER_RUN`, etc.) ### Public-readiness cleanup (P0) - Rewrote `api-framework.md` intro: removed Uber infrastructure references (Kafka, Flink, Cassandra/Redis, internal tooling), fixed "Michleangelo" typo - Removed UberEats-specific examples from `core-concepts-and-key-terms.md` - Fixed bare `@uniflow.task()` in `overview.md` FAQ code example ## Files changed - `docs/user-guides/prepare-your-data.md` - `docs/user-guides/train-and-register-a-model.md` - `docs/getting-started/core-concepts-and-key-terms.md` - `docs/user-guides/model-registry-guide.md` - `docs/user-guides/set-up-triggers.md` - `docs/user-guides/ml-pipelines/pipeline-management.md` - `docs/user-guides/ml-pipelines/file-sync-testing-flow-runbook.md` - `docs/operator-guides/api-framework.md` - `docs/getting-started/overview.md` ## Test plan - [ ] `cd website && bun run build` passes (verified locally) - [ ] Code examples import correctly against Python source - [ ] No Uber/UberEats/internal references remain in changed files 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 6702a3a commit 83d66c1

9 files changed

Lines changed: 72 additions & 61 deletions

File tree

docs/getting-started/core-concepts-and-key-terms.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -61,8 +61,9 @@ A **task** is the fundamental unit of computation in Uniflow. Tasks are modular
6161

6262
```python
6363
import michelangelo.uniflow.core as uniflow
64+
from michelangelo.uniflow.plugins.ray import RayTask
6465

65-
@uniflow.task()
66+
@uniflow.task(config=RayTask(head_cpu=2, head_memory="4Gi"))
6667
def train():
6768
print("training")
6869
```
@@ -73,7 +74,7 @@ def train():
7374
A **workflow** orchestrates multiple tasks, managing dependencies and result passing.
7475

7576
```python
76-
@uniflow.workflow
77+
@uniflow.workflow()
7778
def train_workflow(dataset_id: str):
7879
train_data, valid_data, test_data = load_dataset(dataset_id)
7980
model = train(train_data, valid_data, test_data)
@@ -116,8 +117,8 @@ A business use case with a set of continuously trackable metrics.
116117
**Examples**:
117118
- Predicting customer churn for a subscription service
118119
- Fraud detection for financial transactions
119-
- Ranking restaurants on the UberEats home feed
120-
- Predicting cancellation rate for ride dispatch
120+
- Recommending products on an e-commerce homepage
121+
- Predicting delivery time estimates for a logistics platform
121122

122123
### Model Family
123124

@@ -132,7 +133,7 @@ A Model Family is a group of related ML models within a project that address dif
132133

133134
**Examples**:
134135
- Model excellence scores track the quality of each model family
135-
- UberEats home feed ranking uses different model families optimizing for conversion rate, net inflow, service quality, and fairness
136+
- A home feed ranking system uses different model families optimizing for conversion rate, content quality, and fairness
136137

137138
### Dataset
138139

@@ -286,7 +287,7 @@ See [Appendix: Data Type Examples](#appendix-uniflow-data-type-examples) for det
286287
## Example: Build a Pipeline
287288

288289
```python
289-
@uniflow.workflow
290+
@uniflow.workflow()
290291
def train_workflow(dataset_id: str):
291292
train_data, valid_data, test_data = load_dataset(dataset_id)
292293
model = train(train_data, valid_data, test_data)
@@ -360,12 +361,12 @@ Train Model (select XGBoost) → Evaluate → Deploy
360361

361362
**Uniflow (Code) Path**:
362363
```python
363-
@uniflow.task()
364+
@uniflow.task(config=RayTask(head_cpu=2, head_memory="4Gi"))
364365
def train_model(dataset):
365366
# Your training code
366367
return model
367368

368-
@uniflow.workflow
369+
@uniflow.workflow()
369370
def training_pipeline(dataset_id: str):
370371
data = load_dataset(dataset_id)
371372
model = train_model(data)

docs/getting-started/overview.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,9 @@ A: No. If you're using the UI, it's entirely point-and-click. If you're coding,
141141
**Q: Can I use my existing Python ML code?**
142142
A: Yes! Wrap your training functions with `@uniflow.task()` decorator and you're ready to go. Example:
143143
```python
144-
@uniflow.task()
144+
from michelangelo.uniflow.plugins.ray import RayTask
145+
146+
@uniflow.task(config=RayTask(head_cpu=2, head_memory="4Gi"))
145147
def train_model(data_path: str):
146148
# Your existing training code here
147149
model = train_my_model(data_path)

docs/operator-guides/api-framework.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,12 @@
11
# Michelangelo API Framework
2-
Michelangelo is an end-to-end ML platform that democratizes machine learning and makes scaling AI to meet the needs of the business as easy as requesting a ride. Michelangelo enables ML practitioners to seamlessly build, deploy, and operate machine learning solutions at Uber’s scale. It is designed to cover the end-to-end ML workflow: manage data, train, evaluate, and deploy models, make predictions, and monitor predictions. Michelangelo has been serving production use cases at Uber since 2016 and has become the de-facto system for machine learning for our engineers and data scientists.
32

4-
Michelangelo consists of a mix of open-source systems and components built in-house. We generally prefer to use mature open-source options where possible and will fork, customize, and contribute back as needed, though we sometimes build systems ourselves when open-source solutions are not ideal for our use case.
3+
Michelangelo is an end-to-end ML platform designed to democratize machine learning and make scaling AI accessible across organizations. It enables ML practitioners to seamlessly build, deploy, and operate machine learning solutions at scale. Michelangelo is designed to cover the end-to-end ML workflow: manage data, train, evaluate, and deploy models, make predictions, and monitor predictions.
54

6-
Michelangelo is built on top of Uber’s data and compute infrastructure, providing a data lake that stores all of Uber’s transactional and logged data, Kafka brokers that aggregate logged messages from all Uber’s services, a Flink streaming compute engine, managed Cassandra/Redis clusters, and Uber’s in-house service provisioning and deployment tools.
5+
Michelangelo consists of a mix of open-source systems and components built in-house. We generally prefer to use mature open-source options where possible and will fork, customize, and contribute back as needed, though we sometimes build systems ourselves when open-source solutions are not ideal for a given use case.
76

8-
An important piece of the system is Michelangelo API. This is the brain of the system. It consists of a management application that serves the web UI and network API and integrations with Uber’s system monitoring and alerting infrastructure. Currently, there is no industry-wide API standard for ML platforms and tooling, nor an end-to-end implementation reference available, and there’s no open-source initiative to tackle this problem. Teams and organizations tend to build their own APIs with no industry-wide agreed-upon standards, resulting in duplication of effort and incompatibility among ML products built by different teams.
7+
An important piece of the system is Michelangelo API. This is the brain of the system. It consists of a management application that serves the web UI and network API. Currently, there is no industry-wide API standard for ML platforms and tooling, nor an end-to-end implementation reference available, and there’s no open-source initiative to tackle this problem. Teams and organizations tend to build their own APIs with no industry-wide agreed-upon standards, resulting in duplication of effort and incompatibility among ML products built by different teams.
98

10-
Michleangelo has been field tested with highly complex real-world ML use cases at Uber’s scale, Michelangelo API Framework can help close this gap. We’d like to open-source the API framework which we’ve been building and improving in the past seven years, and to share our years of learning and experience building a highly scalable and reliable end-to-end ML platform with the ML community.
9+
Michelangelo has been field tested with highly complex real-world ML use cases at scale. The Michelangelo API Framework can help close this gap. We’d like to share our years of learning and experience building a highly scalable and reliable end-to-end ML platform with the ML community.
1110

1211
<!--
1312
## Getting Started

docs/user-guides/ml-pipelines/file-sync-testing-flow-runbook.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,10 +35,10 @@ python workflow.py remote-run \
3535
--file-sync
3636
```
3737

38-
`ma pipeline dev_run` support: add `--file-sync` flag to your ma pipeline dev-run command
38+
`ma pipeline dev-run` support: add `--file-sync` flag to your ma pipeline dev-run command
3939

4040
```shell
41-
ma pipeline dev_run --file-sync --file <path_to_pipeline.yaml>
41+
ma pipeline dev-run --file-sync --file <path_to_pipeline.yaml>
4242
```
4343

4444
### Requirements

docs/user-guides/ml-pipelines/pipeline-management.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ To create a pipeline, we must create a directory under the project folder with t
6262
The **pipeline.yaml** file defines the metadata for the pipeline. This file is required to register the pipeline with MA Studio. The format of the **pipeline.yaml** file conforms to this protobuf.
6363

6464
```yaml
65-
apiVersion: michelangelo.uber.com/v2beta1
65+
apiVersion: michelangelo.api/v2
6666
kind: Pipeline
6767
metadata:
6868
namespace: my-project # The name of the project
@@ -196,7 +196,7 @@ The **pipeline.yaml** file defines the metadata for the pipeline. This file is r
196196
Example:
197197

198198
```yaml
199-
apiVersion: michelangelo.uber.com/v2beta1
199+
apiVersion: michelangelo.api/v2
200200
kind: Pipeline
201201
metadata:
202202
namespace: my-project # The name of the project

docs/user-guides/model-registry-guide.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -371,9 +371,10 @@ Package model registration as a task in your ML pipeline:
371371
import michelangelo.uniflow.core as uniflow
372372
from michelangelo.lib.model_manager.packager.custom_triton import CustomTritonPackager
373373
from michelangelo.lib.model_manager.schema import DataType, ModelSchema, ModelSchemaItem
374+
from michelangelo.uniflow.plugins.ray import RayTask
374375

375376

376-
@uniflow.task()
377+
@uniflow.task(config=RayTask(head_cpu=2, head_memory="4Gi"))
377378
def package_model(model_path: str, model_class: str):
378379
"""Package a trained model for deployment."""
379380
packager = CustomTritonPackager()
@@ -399,7 +400,7 @@ def package_model(model_path: str, model_class: str):
399400
This task can be chained after a training task in a workflow:
400401

401402
```py
402-
@uniflow.workflow
403+
@uniflow.workflow()
403404
def train_and_package(dataset_id: str):
404405
model_path = train_model(dataset_id)
405406
package_path = package_model(model_path, "myproject.models.MyModel")

docs/user-guides/prepare-your-data.md

Lines changed: 24 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -14,16 +14,16 @@ Learn how to prepare data in Uniflow for the ML pipeline on Michelangelo using R
1414

1515
```py
1616
import ray.data as rd
17-
from michelangelo.sdk.workflow.variables import DatasetVariable
17+
from michelangelo.workflow.variables import DatasetVariable
1818

1919
dataset = rd.read_parquet("s3://bucket/data.parquet") \
2020
.map_batches(clean_missing_values, batch_size=1000) \
2121
.map_batches(normalize_features) \
2222
.map_batches(encode_categories)
2323

2424
train_ds, val_ds = dataset.train_test_split(test_size=0.2)
25-
train_dv = DatasetVariable(value=train_ds)
26-
val_dv = DatasetVariable(value=val_ds)
25+
train_dv = DatasetVariable.create(train_ds)
26+
val_dv = DatasetVariable.create(val_ds)
2727
```
2828

2929
### Common Preprocessing Functions
@@ -60,9 +60,9 @@ Michelangelo provides `DatasetVariable` to handle datasets across different fram
6060

6161
| Framework | Usage | Load Method |
6262
| ----- | ----- | ----- |
63-
| Ray Datasets | `DatasetVariable(value=ray_dataset)` | `load_ray_dataset()` |
64-
| Pandas DataFrames | `DatasetVariable(value=pandas_df)` | `load_pandas_dataframe()` |
65-
| Spark DataFrames | `DatasetVariable(value=spark_df)` | `load_spark_dataframe()` |
63+
| Ray Datasets | `DatasetVariable.create(ray_dataset)` | `load_ray_dataset()` |
64+
| Pandas DataFrames | `DatasetVariable.create(pandas_df)` | `load_pandas_dataframe()` |
65+
| Spark DataFrames | `DatasetVariable.create(spark_df)` | `load_spark_dataframe()` |
6666

6767
### Direct Dataset Usage
6868

@@ -73,7 +73,11 @@ Michelangelo provides `DatasetVariable` to handle datasets across different fram
7373
| Spark DataFrames | `spark.read.parquet(...)` | Large-scale processing |
7474

7575
```py
76-
@uniflow.task()
76+
import michelangelo.uniflow.core as uniflow
77+
import ray.data as rd
78+
from michelangelo.uniflow.plugins.ray import RayTask
79+
80+
@uniflow.task(config=RayTask(head_cpu=2, head_memory="4Gi"))
7781
def process_data_directly(data_path: str):
7882
dataset = rd.read_parquet(data_path) \
7983
.map_batches(preprocessing_function) \
@@ -85,29 +89,34 @@ def process_data_directly(data_path: str):
8589

8690
```py
8791
import ray.data as rd
88-
from michelangelo.sdk.workflow.variables import DatasetVariable
92+
from michelangelo.workflow.variables import DatasetVariable
8993

9094
ray_dataset = rd.read_parquet("s3://bucket/data.parquet")
91-
dataset_var = DatasetVariable(value=ray_dataset)
95+
dataset_var = DatasetVariable.create(ray_dataset)
9296

9397
import pandas as pd
9498
pandas_df = pd.read_csv("local_file.csv")
95-
dataset_var = DatasetVariable(value=pandas_df)
99+
dataset_var = DatasetVariable.create(pandas_df)
96100

97101
spark_df = spark.read.parquet("s3://bucket/data.parquet")
98-
dataset_var = DatasetVariable(value=spark_df)
102+
dataset_var = DatasetVariable.create(spark_df)
99103
```
100104

101105
## Automatic Storage in Uniflow Tasks
102106

103107
```py
104-
@uniflow.task()
108+
import michelangelo.uniflow.core as uniflow
109+
import ray.data as rd
110+
from michelangelo.workflow.variables import DatasetVariable
111+
from michelangelo.uniflow.plugins.ray import RayTask
112+
113+
@uniflow.task(config=RayTask(head_cpu=2, head_memory="4Gi"))
105114
def prepare_training_data(data_path: str):
106115
dataset = rd.read_parquet(data_path).map_batches(clean_and_normalize)
107116
train_ds, val_ds = dataset.train_test_split(test_size=0.2)
108-
train_dv = DatasetVariable(value=train_ds)
117+
train_dv = DatasetVariable.create(train_ds)
109118
train_dv.save_ray_dataset()
110-
val_dv = DatasetVariable(value=val_ds)
119+
val_dv = DatasetVariable.create(val_ds)
111120
val_dv.save_ray_dataset()
112121
return {
113122
"train": train_dv,
@@ -116,7 +125,7 @@ def prepare_training_data(data_path: str):
116125
```
117126

118127
```py
119-
@uniflow.task()
128+
@uniflow.task(config=RayTask(head_cpu=2, head_memory="4Gi"))
120129
def use_prepared_data(datasets: dict):
121130
datasets["train"].load_ray_dataset()
122131
datasets["validation"].load_ray_dataset()

docs/user-guides/set-up-triggers.md

Lines changed: 20 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,6 @@ Here are the commands for managing your trigger runs:
9898
| **Register pipeline with triggers** | `ma pipeline apply --file=<path_to_pipeline.yaml>` |
9999
| **Create trigger run** | `ma trigger_run create --namespace=<ns> --pipeline=<name> --trigger-name=<trigger>` |
100100
| **Check trigger status** | `ma trigger_run get --namespace=<ns> --name=<name>` |
101-
| **List all triggers** | `ma trigger_run list --namespace=<ns>` |
102101
| **Delete a trigger** | `ma trigger_run delete --namespace=<ns> --name=<name>` |
103102
| **Kill a running trigger** | `ma trigger_run kill --namespace=<ns> --name=<name>` |
104103

@@ -271,17 +270,17 @@ Here's an example that sends an email when a pipeline run fails, and a Slack mes
271270
spec:
272271
notifications:
273272
# Email alert on pipeline run failure
274-
- notification_type: 1 # 1 = Email
275-
event_types: [3] # 3 = Pipeline run failed
276-
resource_type: 2 # 2 = TriggerRun
273+
- notification_type: NOTIFICATION_TYPE_EMAIL
274+
event_types: [EVENT_TYPE_PIPELINE_RUN_STATE_FAILED]
275+
resource_type: RESOURCE_TYPE_TRIGGER_RUN
277276
emails:
278277
- "team-alerts@example.com"
279278
- "your-email@example.com"
280279
281280
# Slack message on trigger success
282-
- notification_type: 2 # 2 = Slack
283-
event_types: [7] # 7 = Trigger run succeeded
284-
resource_type: 2 # 2 = TriggerRun
281+
- notification_type: NOTIFICATION_TYPE_SLACK
282+
event_types: [EVENT_TYPE_TRIGGER_RUN_STATE_SUCCEEDED]
283+
resource_type: RESOURCE_TYPE_TRIGGER_RUN
285284
slack_destinations:
286285
- "#ml-pipeline-alerts"
287286
```
@@ -294,25 +293,25 @@ You can notify on any combination of these events:
294293

295294
| Event | ID | Description |
296295
| :---- | :---- | :---- |
297-
| Pipeline run succeeded | `1` | A pipeline run completed successfully |
298-
| Pipeline run killed | `2` | A pipeline run was manually terminated |
299-
| Pipeline run failed | `3` | A pipeline run encountered an error |
300-
| Pipeline run skipped | `4` | A pipeline run was skipped |
301-
| Trigger run killed | `5` | The trigger itself was terminated |
302-
| Trigger run failed | `6` | The trigger encountered an error |
303-
| Trigger run succeeded | `7` | The trigger completed all scheduled runs |
304-
| Pipeline state ready | `8` | The pipeline is in a ready state |
305-
| Pipeline state error | `9` | The pipeline has entered an error state |
296+
| Pipeline run succeeded | `EVENT_TYPE_PIPELINE_RUN_STATE_SUCCEEDED` | A pipeline run completed successfully |
297+
| Pipeline run killed | `EVENT_TYPE_PIPELINE_RUN_STATE_KILLED` | A pipeline run was manually terminated |
298+
| Pipeline run failed | `EVENT_TYPE_PIPELINE_RUN_STATE_FAILED` | A pipeline run encountered an error |
299+
| Pipeline run skipped | `EVENT_TYPE_PIPELINE_RUN_STATE_SKIPPED` | A pipeline run was skipped |
300+
| Trigger run killed | `EVENT_TYPE_TRIGGER_RUN_STATE_KILLED` | The trigger itself was terminated |
301+
| Trigger run failed | `EVENT_TYPE_TRIGGER_RUN_STATE_FAILED` | The trigger encountered an error |
302+
| Trigger run succeeded | `EVENT_TYPE_TRIGGER_RUN_STATE_SUCCEEDED` | The trigger completed all scheduled runs |
303+
| Pipeline state ready | `EVENT_TYPE_PIPELINE_STATE_READY` | The pipeline is in a ready state |
304+
| Pipeline state error | `EVENT_TYPE_PIPELINE_STATE_ERROR` | The pipeline has entered an error state |
306305

307306
#### Notification and Resource Types
308307

309308
| Field | Value | Meaning |
310309
| :---- | :---- | :---- |
311-
| `notification_type` | `1` | Email |
312-
| `notification_type` | `2` | Slack |
313-
| `resource_type` | `1` | PipelineRun |
314-
| `resource_type` | `2` | TriggerRun |
315-
| `resource_type` | `3` | Pipeline |
310+
| `notification_type` | `NOTIFICATION_TYPE_EMAIL` | Email |
311+
| `notification_type` | `NOTIFICATION_TYPE_SLACK` | Slack |
312+
| `resource_type` | `RESOURCE_TYPE_PIPELINE_RUN` | PipelineRun |
313+
| `resource_type` | `RESOURCE_TYPE_TRIGGER_RUN` | TriggerRun |
314+
| `resource_type` | `RESOURCE_TYPE_PIPELINE` | Pipeline |
316315

317316
> **Tip:** A common setup is to notify on failures via email (for immediate attention) and on successes via Slack (for team visibility). You can list multiple `event_types` in a single notification entry to consolidate alerts.
318317

docs/user-guides/train-and-register-a-model.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -43,9 +43,10 @@ For basic (scikit-learn, lightweight PyTorch) training, load your dataset direct
4343

4444
```py
4545
import michelangelo.uniflow.core as uniflow
46-
from michelangelo.sdk.workflow.variables import DatasetVariable
46+
from michelangelo.workflow.variables import DatasetVariable
47+
from michelangelo.uniflow.plugins.ray import RayTask
4748

48-
@uniflow.task()
49+
@uniflow.task(config=RayTask(head_cpu=2, head_memory="8Gi"))
4950
def train_model(train_dv: DatasetVariable, val_dv: DatasetVariable):
5051
"""Simple training with scikit-learn"""
5152

@@ -73,11 +74,10 @@ To scale training across CPUs/GPUs, wrap your training task using **RayTask**.
7374
## Example: Distributed Deep Learning with Ray Workers
7475

7576
```py
76-
from michelangelo.sdk.trainer.torch.pytorch_lightning.lightning_trainer import (
77-
LightningTrainer, LightningTrainerParam
77+
from michelangelo.lib.trainer.torch.pytorch_lightning.lightning_trainer import (
78+
LightningTrainer, LightningTrainerParam, create_run_config, create_scaling_config
7879
)
7980
from michelangelo.uniflow.plugins.ray import RayTask
80-
from michelangelo.maf.ray.train import create_run_config, create_scaling_config
8181
from ray.train import CheckpointConfig
8282

8383
@uniflow.task(

0 commit comments

Comments
 (0)