docs: add missing prerequisites, next steps, and troubleshooting to pipeline guides (#1086)

zhoward-1 · claude · web-flow · commit 40e7e82abe1f · 2026-04-10T13:41:58.000-07:00
## Summary Fills in missing structural sections across four pipeline guides that were flagged as incomplete. - **`running-uniflow.md`** — Added Prerequisites (sandbox, Poetry, Docker, workflow definition) and Next Steps (file sync, caching, triggers, model registry) - **`cache-and-pipelinerun-resume-form.md`** — Added Prerequisites (remote execution required, Ray/Spark only) and Next Steps (triggers, file sync, MA Studio monitoring) - **`file-sync-testing-flow-runbook.md`** — Added Prerequisites (sandbox, existing Docker image, Git repo, cloud storage creds); expanded Troubleshooting from a single raw log snippet into three structured entries (missing credentials, unexpected files, changes not picked up); added Next Steps - **`train-and-register-a-model.md`** — Added Prerequisites (sandbox, prepared dataset, SDK install, Docker for distributed runs) ## Test plan - [ ] `cd website && bun run build` passes (verified locally) - [ ] Prerequisites link to correct pages - [ ] Next Steps links resolve correctly 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
diff --git a/docs/user-guides/ml-pipelines/cache-and-pipelinerun-resume-form.md b/docs/user-guides/ml-pipelines/cache-and-pipelinerun-resume-form.md
@@ -6,6 +6,11 @@
 * How cache keys are determined
 * How to resume a pipeline run from a specific step
 
+## Prerequisites
+
+- **A working remote execution setup** — Caching and resume only apply to remote runs. See [Running Uniflow Pipelines](./running-uniflow.md) to get remote execution working first.
+- **Ray or Spark tasks** — Only Ray and Spark tasks support caching. Local execution does not cache results.
+
 ## Task caching
 
 For each task in a Uniflow Remote Run, we cache and index the task results after execution. Next time you execute the task, you have the option to skip execution by reusing the cached results.
@@ -55,3 +60,9 @@ ma pipeline run -n <namespace> --revision <pipeline-revision-name> --resume_from
 ```
 
 **Important:** To skip a step during resume, Uniflow requires that the input of the step has not changed.
+
+## Next Steps
+
+- **Run pipelines on a schedule** — See [Set Up Triggers](../set-up-triggers.md) to automate pipeline execution with cron triggers
+- **Test changes without rebuilding** — Use [file sync](./file-sync-testing-flow-runbook.md) to iterate faster during development
+- **Monitor pipeline runs** — Open MA Studio at `http://localhost:8090/<your-project>` to view run history, step status, and cached results
diff --git a/docs/user-guides/ml-pipelines/file-sync-testing-flow-runbook.md b/docs/user-guides/ml-pipelines/file-sync-testing-flow-runbook.md
@@ -6,6 +6,13 @@
 * When to use (and not use) file sync
 * What files get synced and the typical development flow
 
+## Prerequisites
+
+- **A running sandbox** — File sync pushes code to a remote cluster. Follow the [Sandbox Setup](../../getting-started/sandbox-setup.md) guide first.
+- **A Docker image already built for your workflow** — File sync patches an existing image; it does not build one. See [Running Uniflow Pipelines](./running-uniflow.md) for image build steps.
+- **A Git repository** — File sync uses Git metadata to determine which files changed since the image was built.
+- **Cloud storage credentials** — S3/MinIO access is required to upload the sync tarball (see Troubleshooting below).
+
 ## What is file sync?
 
 File Sync lets you test your local code changes on remote infrastructure **without rebuilding Docker images**. Instead of waiting 20+ minutes for image builds per task, you can sync your changes in 2-5 minutes.
@@ -89,11 +96,32 @@ ma pipeline dev-run --file-sync --file <path_to_pipeline.yaml>
 4. **Commit and rebuild image** only when ready for production
 
 ## Troubleshooting
-1. No fsspec credentials, once kicking off remote run, it failed with below error:
-```2026-03-23 09:14:44,722 |    ERROR | michelangelo.uniflow.core.file_sync      | Failed to upload tarball: Unable to locate credentials```
-setup credentials before starting remote run workflow
+
+### Missing cloud storage credentials
+
+If the sync fails with:
 ```
+Failed to upload tarball: Unable to locate credentials
+```
+
+Set credentials before running:
+
+```bash
 export AWS_ACCESS_KEY_ID=minioadmin
 export AWS_SECRET_ACCESS_KEY=minioadmin
 export AWS_ENDPOINT_URL=http://localhost:9091
 ```
+
+### Unexpected files being synced
+
+If more files are synced than expected, your Docker image may not have Git metadata. In that case, file sync sends all uncommitted changes rather than just the diff since the image was built. Ensure the image is built from your current branch with Git history included.
+
+### File sync not picking up changes
+
+File sync only includes files tracked by Git. If you added new files, make sure they are staged (`git add`) so Git is aware of them.
+
+## Next Steps
+
+- **Cache results between runs** — See [Uniflow caching and pipeline run resume](./cache-and-pipelinerun-resume-form.md) to skip unchanged steps and resume failed runs
+- **Run on a schedule** — See [Set Up Triggers](../set-up-triggers.md) to automate pipeline execution with cron triggers
+- **Build and register your model** — Once your code is validated, follow the [Model Registry Guide](../model-registry-guide.md) to package and version your model
diff --git a/docs/user-guides/ml-pipelines/running-uniflow.md b/docs/user-guides/ml-pipelines/running-uniflow.md
@@ -8,6 +8,13 @@ This guide covers how to run Uniflow pipelines locally and remotely.
 * The differences between local and remote execution modes
 * How to debug workflows and container issues
 
+## Prerequisites
+
+- **A running sandbox environment** — Remote execution requires a local Kubernetes cluster. Follow the [Sandbox Setup](../../getting-started/sandbox-setup.md) guide if you haven't done this yet.
+- **Python 3.11+ and Poetry installed** — See the [Sandbox Setup prerequisites](../../getting-started/sandbox-setup.md#prerequisites).
+- **A Uniflow workflow defined** — See [Getting Started with ML Pipelines](./getting-started.md) for a walkthrough of defining tasks and workflows.
+- **Docker** — Required for building images used in remote execution.
+
 ## Environment setup
 
 Create Python virtual environment and install packages:
@@ -161,3 +168,10 @@ docker pull ghcr.io/michelangelo-ai/worker:latest
 docker images
 docker exec -it k3d-michelangelo-sandbox-server-0 crictl images
 ```
+
+## Next Steps
+
+- **Speed up iteration** — Use [file sync](./file-sync-testing-flow-runbook.md) to test local code changes on remote infrastructure without rebuilding Docker images
+- **Cache task results** — Learn how [Uniflow caching and pipeline run resume](./cache-and-pipelinerun-resume-form.md) can speed up repeated runs
+- **Run on a schedule** — See [Set Up Triggers](../set-up-triggers.md) to run your pipeline automatically on a cron schedule
+- **Register your model** — After a successful training run, follow the [Model Registry Guide](../model-registry-guide.md) to package and version your model
diff --git a/docs/user-guides/train-and-register-a-model.md b/docs/user-guides/train-and-register-a-model.md
@@ -6,11 +6,18 @@ The focus is simplicity: **you control your training logic**, Michelangelo provi
 
 ## What You'll Learn
 
-* How datasets are passed to training tasks  
-* How to load Ray, Pandas, or Spark datasets  
-* How to scale training with Ray workers  
+* How datasets are passed to training tasks
+* How to load Ray, Pandas, or Spark datasets
+* How to scale training with Ray workers
 * How to use the Lightning Trainer SDK for deep learning
 
+## Prerequisites
+
+- **A running sandbox** — Remote training runs require a local Kubernetes cluster. Follow the [Sandbox Setup](../getting-started/sandbox-setup.md) guide if you haven't done this yet.
+- **A prepared dataset** — Training tasks expect datasets passed as `DatasetVariable`. See [Data Preparation](./prepare-your-data.md) for how to produce them.
+- **Python 3.11+, Poetry, and the Michelangelo SDK installed** — Run `cd python && poetry install` from the repo root.
+- **For distributed training:** A Docker image with your workflow code. See [Running Uniflow Pipelines](./ml-pipelines/running-uniflow.md) for image build steps.
+
 ## Understanding Training Inputs
 
 Michelangelo workflows pass datasets using **DatasetVariable**.