diff --git a/examples/README.md b/examples/README.md index 128b1562d4..9ed2afa438 100644 --- a/examples/README.md +++ b/examples/README.md @@ -8,7 +8,7 @@ These examples provide concrete examples to leverage slime in your own RL workfl - **[fully_async](./fully_async)**: Demonstrates fully asynchronous rollout generation for higher efficiency. - **[geo3k_vlm](./geo3k_vlm)**: Training VLMs on a single-turn reasoning task using GRPO on the GEO3K dataset. - **[geo3k_vlm_multi_turn](./geo3k_vlm_multi_turn)**: VLM multi-turn training on Geo3k dataset. -- **[low_precision](./low_precision)**: Examples of FP8 training and inference for improved throughput and stability. +- **[low_precision](../scripts/low_precision)**: Examples of FP8 training and inference for improved throughput and stability. - **[multi_agent](./multi_agent)**: Example of running multi-agent RL with `slime`. - **[on_policy_distillation](./on_policy_distillation)**: Example implementation for on-policy distillation, extending the reinforcement learning pipeline to support teacher–student distillation directly within on-policy training. - **[delta_weight_sync](./delta_weight_sync)**: Non-colocated weight sync that ships only changed positions + values over disk (training/inference disaggregation) or NCCL.