fix: correct broken code examples and remove internal references from docs (#1085)

zhoward-1 · claude · web-flow · commit 83d66c1f4062 · 2026-04-10T13:37:18.000-07:00
## Summary Fixes 10 broken/incorrect code examples and removes internal Uber references identified during a documentation audit. ### Critical code fixes - **Wrong `DatasetVariable` import** (`michelangelo.sdk.workflow.variables` → `michelangelo.workflow.variables`) — package doesn't exist - **Wrong `DatasetVariable` constructor** (`DatasetVariable(value=...)` → `DatasetVariable.create(...)`) — all usages - **Wrong `LightningTrainer` imports** (`michelangelo.sdk.trainer` + `michelangelo.maf` → `michelangelo.lib.trainer`) — packages don't exist - **`@uniflow.task()` missing required `config=`** — `config: TaskConfig` has no default; bare calls crash at runtime. Added `config=RayTask(...)` and `RayTask` imports to all 6 affected locations - **`@uniflow.workflow` missing parens** → `@uniflow.workflow()` (4 occurrences) ### Major corrections - `apiVersion: michelangelo.uber.com/v2beta1` → `michelangelo.api/v2` in `pipeline-management.md` - Removed nonexistent `ma trigger_run list` command from CLI reference table - `ma pipeline dev_run` → `ma pipeline dev-run` (kebab-case) - Notification YAML updated from numeric enum IDs to proto string names (`NOTIFICATION_TYPE_EMAIL`, `EVENT_TYPE_PIPELINE_RUN_STATE_FAILED`, `RESOURCE_TYPE_TRIGGER_RUN`, etc.) ### Public-readiness cleanup (P0) - Rewrote `api-framework.md` intro: removed Uber infrastructure references (Kafka, Flink, Cassandra/Redis, internal tooling), fixed "Michleangelo" typo - Removed UberEats-specific examples from `core-concepts-and-key-terms.md` - Fixed bare `@uniflow.task()` in `overview.md` FAQ code example ## Files changed - `docs/user-guides/prepare-your-data.md` - `docs/user-guides/train-and-register-a-model.md` - `docs/getting-started/core-concepts-and-key-terms.md` - `docs/user-guides/model-registry-guide.md` - `docs/user-guides/set-up-triggers.md` - `docs/user-guides/ml-pipelines/pipeline-management.md` - `docs/user-guides/ml-pipelines/file-sync-testing-flow-runbook.md` - `docs/operator-guides/api-framework.md` - `docs/getting-started/overview.md` ## Test plan - [ ] `cd website && bun run build` passes (verified locally) - [ ] Code examples import correctly against Python source - [ ] No Uber/UberEats/internal references remain in changed files 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
diff --git a/docs/getting-started/core-concepts-and-key-terms.md b/docs/getting-started/core-concepts-and-key-terms.md
@@ -61,8 +61,9 @@ A **task** is the fundamental unit of computation in Uniflow. Tasks are modular
 
 ```python
 import michelangelo.uniflow.core as uniflow
+from michelangelo.uniflow.plugins.ray import RayTask
 
-@uniflow.task()
+@uniflow.task(config=RayTask(head_cpu=2, head_memory="4Gi"))
 def train():
     print("training")
 ```
@@ -73,7 +74,7 @@ def train():
 A **workflow** orchestrates multiple tasks, managing dependencies and result passing.
 
 ```python
-@uniflow.workflow
+@uniflow.workflow()
 def train_workflow(dataset_id: str):
     train_data, valid_data, test_data = load_dataset(dataset_id)
     model = train(train_data, valid_data, test_data)
@@ -116,8 +117,8 @@ A business use case with a set of continuously trackable metrics.
 **Examples**:
 -   Predicting customer churn for a subscription service
 -   Fraud detection for financial transactions
--   Ranking restaurants on the UberEats home feed
--   Predicting cancellation rate for ride dispatch
+-   Recommending products on an e-commerce homepage
+-   Predicting delivery time estimates for a logistics platform
 
 ### Model Family
 
@@ -132,7 +133,7 @@ A Model Family is a group of related ML models within a project that address dif
 
 **Examples**:
 -   Model excellence scores track the quality of each model family
--   UberEats home feed ranking uses different model families optimizing for conversion rate, net inflow, service quality, and fairness
+-   A home feed ranking system uses different model families optimizing for conversion rate, content quality, and fairness
 
 ### Dataset
 
@@ -286,7 +287,7 @@ See [Appendix: Data Type Examples](#appendix-uniflow-data-type-examples) for det
 ## Example: Build a Pipeline
 
 ```python
-@uniflow.workflow
+@uniflow.workflow()
 def train_workflow(dataset_id: str):
     train_data, valid_data, test_data = load_dataset(dataset_id)
     model = train(train_data, valid_data, test_data)
@@ -360,12 +361,12 @@ Train Model (select XGBoost) → Evaluate → Deploy
 
 **Uniflow (Code) Path**:
 ```python
-@uniflow.task()
+@uniflow.task(config=RayTask(head_cpu=2, head_memory="4Gi"))
 def train_model(dataset):
     # Your training code
     return model
 
-@uniflow.workflow
+@uniflow.workflow()
 def training_pipeline(dataset_id: str):
     data = load_dataset(dataset_id)
     model = train_model(data)
diff --git a/docs/getting-started/overview.md b/docs/getting-started/overview.md
@@ -141,7 +141,9 @@ A: No. If you're using the UI, it's entirely point-and-click. If you're coding,
 **Q: Can I use my existing Python ML code?**
 A: Yes! Wrap your training functions with `@uniflow.task()` decorator and you're ready to go. Example:
 ```python
-@uniflow.task()
+from michelangelo.uniflow.plugins.ray import RayTask
+
+@uniflow.task(config=RayTask(head_cpu=2, head_memory="4Gi"))
 def train_model(data_path: str):
     # Your existing training code here
     model = train_my_model(data_path)
diff --git a/docs/operator-guides/api-framework.md b/docs/operator-guides/api-framework.md
@@ -1,13 +1,12 @@
 # Michelangelo API Framework
-Michelangelo is an end-to-end ML platform that democratizes machine learning and makes scaling AI to meet the needs of the business as easy as requesting a ride. Michelangelo enables ML practitioners to seamlessly build, deploy, and operate machine learning solutions at Uber’s scale. It is designed to cover the end-to-end ML workflow: manage data, train, evaluate, and deploy models, make predictions, and monitor predictions. Michelangelo has been serving production use cases at Uber since 2016 and has become the de-facto system for machine learning for our engineers and data scientists.
 
-Michelangelo consists of a mix of open-source systems and components built in-house. We generally prefer to use mature open-source options where possible and will fork, customize, and contribute back as needed, though we sometimes build systems ourselves when open-source solutions are not ideal for our use case.
+Michelangelo is an end-to-end ML platform designed to democratize machine learning and make scaling AI accessible across organizations. It enables ML practitioners to seamlessly build, deploy, and operate machine learning solutions at scale. Michelangelo is designed to cover the end-to-end ML workflow: manage data, train, evaluate, and deploy models, make predictions, and monitor predictions.
 
-Michelangelo is built on top of Uber’s data and compute infrastructure, providing a data lake that stores all of Uber’s transactional and logged data, Kafka brokers that aggregate logged messages from all Uber’s services, a Flink streaming compute engine, managed Cassandra/Redis clusters, and Uber’s in-house service provisioning and deployment tools.
+Michelangelo consists of a mix of open-source systems and components built in-house. We generally prefer to use mature open-source options where possible and will fork, customize, and contribute back as needed, though we sometimes build systems ourselves when open-source solutions are not ideal for a given use case.
 
-An important piece of the system is Michelangelo API. This is the brain of the system. It consists of a management application that serves the web UI and network API and integrations with Uber’s system monitoring and alerting infrastructure. Currently, there is no industry-wide API standard for ML platforms and tooling, nor an end-to-end implementation reference available, and there’s no open-source initiative to tackle this problem. Teams and organizations tend to build their own APIs with no industry-wide agreed-upon standards, resulting in duplication of effort and incompatibility among ML products built by different teams.
+An important piece of the system is Michelangelo API. This is the brain of the system. It consists of a management application that serves the web UI and network API. Currently, there is no industry-wide API standard for ML platforms and tooling, nor an end-to-end implementation reference available, and there’s no open-source initiative to tackle this problem. Teams and organizations tend to build their own APIs with no industry-wide agreed-upon standards, resulting in duplication of effort and incompatibility among ML products built by different teams.
 
-Michleangelo has been field tested with highly complex real-world ML use cases at Uber’s scale, Michelangelo API Framework can help close this gap. We’d like to open-source the API framework which we’ve been building and improving in the past seven years, and to share our years of learning and experience building a highly scalable and reliable end-to-end ML platform with the ML community.
+Michelangelo has been field tested with highly complex real-world ML use cases at scale. The Michelangelo API Framework can help close this gap. We’d like to share our years of learning and experience building a highly scalable and reliable end-to-end ML platform with the ML community.
 
 <!--
 ## Getting Started
diff --git a/docs/user-guides/ml-pipelines/file-sync-testing-flow-runbook.md b/docs/user-guides/ml-pipelines/file-sync-testing-flow-runbook.md
@@ -35,10 +35,10 @@ python workflow.py remote-run \
     --file-sync
 ```
 
-`ma pipeline dev_run` support: add `--file-sync` flag to your ma pipeline dev-run command
+`ma pipeline dev-run` support: add `--file-sync` flag to your ma pipeline dev-run command
 
 ```shell
-ma pipeline dev_run --file-sync --file <path_to_pipeline.yaml>
+ma pipeline dev-run --file-sync --file <path_to_pipeline.yaml>
 ```
 
 ### Requirements
diff --git a/docs/user-guides/ml-pipelines/pipeline-management.md b/docs/user-guides/ml-pipelines/pipeline-management.md
@@ -62,7 +62,7 @@ To create a pipeline, we must create a directory under the project folder with t
 The **pipeline.yaml** file defines the metadata for the pipeline. This file is required to register the pipeline with MA Studio. The format of the **pipeline.yaml** file conforms to this protobuf.
 
 ```yaml
-apiVersion: michelangelo.uber.com/v2beta1
+apiVersion: michelangelo.api/v2
 kind: Pipeline
 metadata:
   namespace: my-project              # The name of the project
@@ -196,7 +196,7 @@ The **pipeline.yaml** file defines the metadata for the pipeline. This file is r
 Example:
 
 ```yaml
-apiVersion: michelangelo.uber.com/v2beta1
+apiVersion: michelangelo.api/v2
 kind: Pipeline
 metadata:
   namespace: my-project              # The name of the project
diff --git a/docs/user-guides/model-registry-guide.md b/docs/user-guides/model-registry-guide.md
@@ -371,9 +371,10 @@ Package model registration as a task in your ML pipeline:
 import michelangelo.uniflow.core as uniflow
 from michelangelo.lib.model_manager.packager.custom_triton import CustomTritonPackager
 from michelangelo.lib.model_manager.schema import DataType, ModelSchema, ModelSchemaItem
+from michelangelo.uniflow.plugins.ray import RayTask
 
 
-@uniflow.task()
+@uniflow.task(config=RayTask(head_cpu=2, head_memory="4Gi"))
 def package_model(model_path: str, model_class: str):
     """Package a trained model for deployment."""
     packager = CustomTritonPackager()
@@ -399,7 +400,7 @@ def package_model(model_path: str, model_class: str):
 This task can be chained after a training task in a workflow:
 
 ```py
-@uniflow.workflow
+@uniflow.workflow()
 def train_and_package(dataset_id: str):
     model_path = train_model(dataset_id)
     package_path = package_model(model_path, "myproject.models.MyModel")
diff --git a/docs/user-guides/prepare-your-data.md b/docs/user-guides/prepare-your-data.md
@@ -14,16 +14,16 @@ Learn how to prepare data in Uniflow for the ML pipeline on Michelangelo using R
 
 ```py
 import ray.data as rd
-from michelangelo.sdk.workflow.variables import DatasetVariable
+from michelangelo.workflow.variables import DatasetVariable
 
 dataset = rd.read_parquet("s3://bucket/data.parquet") \
     .map_batches(clean_missing_values, batch_size=1000) \
     .map_batches(normalize_features) \
     .map_batches(encode_categories)
 
 train_ds, val_ds = dataset.train_test_split(test_size=0.2)
-train_dv = DatasetVariable(value=train_ds)
-val_dv = DatasetVariable(value=val_ds)
+train_dv = DatasetVariable.create(train_ds)
+val_dv = DatasetVariable.create(val_ds)
 ```
 
 ### Common Preprocessing Functions
@@ -60,9 +60,9 @@ Michelangelo provides `DatasetVariable` to handle datasets across different fram
 
 | Framework | Usage | Load Method |
 | ----- | ----- | ----- |
-| Ray Datasets | `DatasetVariable(value=ray_dataset)` | `load_ray_dataset()` |
-| Pandas DataFrames | `DatasetVariable(value=pandas_df)` | `load_pandas_dataframe()` |
-| Spark DataFrames | `DatasetVariable(value=spark_df)` | `load_spark_dataframe()` |
+| Ray Datasets | `DatasetVariable.create(ray_dataset)` | `load_ray_dataset()` |
+| Pandas DataFrames | `DatasetVariable.create(pandas_df)` | `load_pandas_dataframe()` |
+| Spark DataFrames | `DatasetVariable.create(spark_df)` | `load_spark_dataframe()` |
 
 ### Direct Dataset Usage
 
@@ -73,7 +73,11 @@ Michelangelo provides `DatasetVariable` to handle datasets across different fram
 | Spark DataFrames | `spark.read.parquet(...)` | Large-scale processing |
 
 ```py
-@uniflow.task()
+import michelangelo.uniflow.core as uniflow
+import ray.data as rd
+from michelangelo.uniflow.plugins.ray import RayTask
+
+@uniflow.task(config=RayTask(head_cpu=2, head_memory="4Gi"))
 def process_data_directly(data_path: str):
     dataset = rd.read_parquet(data_path) \
         .map_batches(preprocessing_function) \
@@ -85,29 +89,34 @@ def process_data_directly(data_path: str):
 
 ```py
 import ray.data as rd
-from michelangelo.sdk.workflow.variables import DatasetVariable
+from michelangelo.workflow.variables import DatasetVariable
 
 ray_dataset = rd.read_parquet("s3://bucket/data.parquet")
-dataset_var = DatasetVariable(value=ray_dataset)
+dataset_var = DatasetVariable.create(ray_dataset)
 
 import pandas as pd
 pandas_df = pd.read_csv("local_file.csv")
-dataset_var = DatasetVariable(value=pandas_df)
+dataset_var = DatasetVariable.create(pandas_df)
 
 spark_df = spark.read.parquet("s3://bucket/data.parquet")
-dataset_var = DatasetVariable(value=spark_df)
+dataset_var = DatasetVariable.create(spark_df)
 ```
 
 ## Automatic Storage in Uniflow Tasks
 
 ```py
-@uniflow.task()
+import michelangelo.uniflow.core as uniflow
+import ray.data as rd
+from michelangelo.workflow.variables import DatasetVariable
+from michelangelo.uniflow.plugins.ray import RayTask
+
+@uniflow.task(config=RayTask(head_cpu=2, head_memory="4Gi"))
 def prepare_training_data(data_path: str):
     dataset = rd.read_parquet(data_path).map_batches(clean_and_normalize)
     train_ds, val_ds = dataset.train_test_split(test_size=0.2)
-    train_dv = DatasetVariable(value=train_ds)
+    train_dv = DatasetVariable.create(train_ds)
     train_dv.save_ray_dataset()
-    val_dv = DatasetVariable(value=val_ds)
+    val_dv = DatasetVariable.create(val_ds)
     val_dv.save_ray_dataset()
     return {
         "train": train_dv,
@@ -116,7 +125,7 @@ def prepare_training_data(data_path: str):
 ```
 
 ```py
-@uniflow.task()
+@uniflow.task(config=RayTask(head_cpu=2, head_memory="4Gi"))
 def use_prepared_data(datasets: dict):
     datasets["train"].load_ray_dataset()
     datasets["validation"].load_ray_dataset()
diff --git a/docs/user-guides/set-up-triggers.md b/docs/user-guides/set-up-triggers.md
@@ -98,7 +98,6 @@ Here are the commands for managing your trigger runs:
 | **Register pipeline with triggers** | `ma pipeline apply --file=<path_to_pipeline.yaml>` |
 | **Create trigger run** | `ma trigger_run create --namespace=<ns> --pipeline=<name> --trigger-name=<trigger>` |
 | **Check trigger status** | `ma trigger_run get --namespace=<ns> --name=<name>` |
-| **List all triggers** | `ma trigger_run list --namespace=<ns>` |
 | **Delete a trigger** | `ma trigger_run delete --namespace=<ns> --name=<name>` |
 | **Kill a running trigger** | `ma trigger_run kill --namespace=<ns> --name=<name>` |
 
@@ -271,17 +270,17 @@ Here's an example that sends an email when a pipeline run fails, and a Slack mes
 spec:
   notifications:
     # Email alert on pipeline run failure
-    - notification_type: 1              # 1 = Email
-      event_types: [3]                  # 3 = Pipeline run failed
-      resource_type: 2                  # 2 = TriggerRun
+    - notification_type: NOTIFICATION_TYPE_EMAIL
+      event_types: [EVENT_TYPE_PIPELINE_RUN_STATE_FAILED]
+      resource_type: RESOURCE_TYPE_TRIGGER_RUN
       emails:
         - "team-alerts@example.com"
         - "your-email@example.com"
 
     # Slack message on trigger success
-    - notification_type: 2              # 2 = Slack
-      event_types: [7]                  # 7 = Trigger run succeeded
-      resource_type: 2                  # 2 = TriggerRun
+    - notification_type: NOTIFICATION_TYPE_SLACK
+      event_types: [EVENT_TYPE_TRIGGER_RUN_STATE_SUCCEEDED]
+      resource_type: RESOURCE_TYPE_TRIGGER_RUN
       slack_destinations:
         - "#ml-pipeline-alerts"
 ```
@@ -294,25 +293,25 @@ You can notify on any combination of these events:
 
 | Event | ID | Description |
 | :---- | :---- | :---- |
-| Pipeline run succeeded | `1` | A pipeline run completed successfully |
-| Pipeline run killed | `2` | A pipeline run was manually terminated |
-| Pipeline run failed | `3` | A pipeline run encountered an error |
-| Pipeline run skipped | `4` | A pipeline run was skipped |
-| Trigger run killed | `5` | The trigger itself was terminated |
-| Trigger run failed | `6` | The trigger encountered an error |
-| Trigger run succeeded | `7` | The trigger completed all scheduled runs |
-| Pipeline state ready | `8` | The pipeline is in a ready state |
-| Pipeline state error | `9` | The pipeline has entered an error state |
+| Pipeline run succeeded | `EVENT_TYPE_PIPELINE_RUN_STATE_SUCCEEDED` | A pipeline run completed successfully |
+| Pipeline run killed | `EVENT_TYPE_PIPELINE_RUN_STATE_KILLED` | A pipeline run was manually terminated |
+| Pipeline run failed | `EVENT_TYPE_PIPELINE_RUN_STATE_FAILED` | A pipeline run encountered an error |
+| Pipeline run skipped | `EVENT_TYPE_PIPELINE_RUN_STATE_SKIPPED` | A pipeline run was skipped |
+| Trigger run killed | `EVENT_TYPE_TRIGGER_RUN_STATE_KILLED` | The trigger itself was terminated |
+| Trigger run failed | `EVENT_TYPE_TRIGGER_RUN_STATE_FAILED` | The trigger encountered an error |
+| Trigger run succeeded | `EVENT_TYPE_TRIGGER_RUN_STATE_SUCCEEDED` | The trigger completed all scheduled runs |
+| Pipeline state ready | `EVENT_TYPE_PIPELINE_STATE_READY` | The pipeline is in a ready state |
+| Pipeline state error | `EVENT_TYPE_PIPELINE_STATE_ERROR` | The pipeline has entered an error state |
 
 #### Notification and Resource Types
 
 | Field | Value | Meaning |
 | :---- | :---- | :---- |
-| `notification_type` | `1` | Email |
-| `notification_type` | `2` | Slack |
-| `resource_type` | `1` | PipelineRun |
-| `resource_type` | `2` | TriggerRun |
-| `resource_type` | `3` | Pipeline |
+| `notification_type` | `NOTIFICATION_TYPE_EMAIL` | Email |
+| `notification_type` | `NOTIFICATION_TYPE_SLACK` | Slack |
+| `resource_type` | `RESOURCE_TYPE_PIPELINE_RUN` | PipelineRun |
+| `resource_type` | `RESOURCE_TYPE_TRIGGER_RUN` | TriggerRun |
+| `resource_type` | `RESOURCE_TYPE_PIPELINE` | Pipeline |
 
 > **Tip:** A common setup is to notify on failures via email (for immediate attention) and on successes via Slack (for team visibility). You can list multiple `event_types` in a single notification entry to consolidate alerts.
 
diff --git a/docs/user-guides/train-and-register-a-model.md b/docs/user-guides/train-and-register-a-model.md
@@ -43,9 +43,10 @@ For basic (scikit-learn, lightweight PyTorch) training, load your dataset direct
 
 ```py
 import michelangelo.uniflow.core as uniflow
-from michelangelo.sdk.workflow.variables import DatasetVariable
+from michelangelo.workflow.variables import DatasetVariable
+from michelangelo.uniflow.plugins.ray import RayTask
 
-@uniflow.task()
+@uniflow.task(config=RayTask(head_cpu=2, head_memory="8Gi"))
 def train_model(train_dv: DatasetVariable, val_dv: DatasetVariable):
     """Simple training with scikit-learn"""
 
@@ -73,11 +74,10 @@ To scale training across CPUs/GPUs, wrap your training task using **RayTask**.
 ## Example: Distributed Deep Learning with Ray Workers
 
 ```py
-from michelangelo.sdk.trainer.torch.pytorch_lightning.lightning_trainer import (
-    LightningTrainer, LightningTrainerParam
+from michelangelo.lib.trainer.torch.pytorch_lightning.lightning_trainer import (
+    LightningTrainer, LightningTrainerParam, create_run_config, create_scaling_config
 )
 from michelangelo.uniflow.plugins.ray import RayTask
-from michelangelo.maf.ray.train import create_run_config, create_scaling_config
 from ray.train import CheckpointConfig
 
 @uniflow.task(