Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@ These examples provide concrete examples to leverage slime in your own RL workfl
- **[fully_async](./fully_async)**: Demonstrates fully asynchronous rollout generation for higher efficiency.
- **[geo3k_vlm](./geo3k_vlm)**: Training VLMs on a single-turn reasoning task using GRPO on the GEO3K dataset.
- **[geo3k_vlm_multi_turn](./geo3k_vlm_multi_turn)**: VLM multi-turn training on Geo3k dataset.
- **[low_precision](./low_precision)**: Examples of FP8 training and inference for improved throughput and stability.
- **[low_precision](../docs/en/advanced/low-precision.md)**: Examples of FP8 training and inference for improved throughput and stability.
- **[multi_agent](./multi_agent)**: Example of running multi-agent RL with `slime`.
- **[on_policy_distillation](./on_policy_distillation)**: Example implementation for on-policy distillation, extending the reinforcement learning pipeline to support teacher–student distillation directly within on-policy training.
- **[delta_weight_sync](./delta_weight_sync)**: Non-colocated weight sync that ships only changed positions + values over disk (training/inference disaggregation) or NCCL.
- **[reproducibility](./reproducibility)**: Guides on achieving bitwise experiment reproduction using deterministic modes.
- **[reproducibility](../docs/en/advanced/reproducibility.md)**: Guides on achieving bitwise experiment reproduction using deterministic modes.
- **[retool](./retool)**: Demonstrates the retool functionality for tool-enabled language model generation.
- **[search-r1](./search-r1)**: A minimal reproduction of Search-R1, featuring multi-turn conversation and tool-calling.
- **[strands_sglang](./strands_sglang)**: Integration example with the Strands-Agents scaffolding framework.
Expand Down
15 changes: 5 additions & 10 deletions examples/delta_weight_sync/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,20 +48,15 @@ See [docs/en/advanced/delta-weight-sync.md](../../docs/en/advanced/delta-weight-

## Results

W&B traces comparing delta sync against the full-sync baseline on GLM-4.7-355B-A32B / DAPO-Math-17k.
W&B traces comparing delta sync against the full-sync baseline on GLM-4.7-355B-A32B / DAPO-Math-17k track:

![Raw reward](./raw_reward.png)

![Train/rollout logprob abs diff](./train_rollout_logprob_abs_diff.png)

![Update weights time](./update_weights_time.png)
- `raw_reward` — training reward curve vs full-sync baseline
- `train/train_rollout_logprob_abs_diff` — token-level logprob mismatch between train and rollout
- `perf/update_weights_time` — wall time per weight sync
- `perf/update_weights_density` — fraction of weight positions that moved between consecutive syncs (sync 0 omitted: snapshot-seeding pass with density = 1.0)

> **Note on the small curve-to-curve gap.** RL training is inherently non-deterministic (cuBLAS reductions, FlashAttention split-K, NCCL all-reduce ordering, dynamic-batch token assignment). Two identically-configured *full*-sync runs would diverge the same way. Delta sync's selective overwrite is bit-exact with full sync per step (no arithmetic, no drift); the trajectory matches, the bits don't.

![Update weights density](./update_weights_density.png)

*Per-sync change density (`perf/update_weights_density`) — fraction of weight positions that moved between consecutive syncs. Sync 0 is omitted: it's the snapshot-seeding pass with density = 1.0, which would compress the y-axis.*

## Why these encoding defaults

Per-sync change density during RL fine-tuning at conservative LRs sits around **2-3%** ([arXiv:2602.03839](https://arxiv.org/pdf/2602.03839) reports ~1% on a related setup; we measured ~2-3% on this run). Below the 3.125% break-even point, gap-encoded positions are smaller than absolute indices — the disk default `deltas_zstd` adds zstd L1 on top to squeeze the gap byte stream further (~35-40%), which is the right tradeoff when shared-FS bandwidth is ≤ 300 MB/s. Intra-datacenter NCCL has no bandwidth pressure, so `indices` (lowest compute, biggest payload) is the cleaner default there.
2 changes: 1 addition & 1 deletion slime_plugins/rollout_buffer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ In addition, Rollout Buffer also provides some customizable functions to meet sp

### Example Script

First, you need to follow [Example: Qwen3-4B Model](../../docs/en/models/qwen3-4B.md) to configure the environment, download data and convert model checkpoints. And then run the following scripts:
First, you need to follow [Example: Qwen3-4B Model](../../docs/en/examples/qwen3-4B.md) to configure the environment, download data and convert model checkpoints. And then run the following scripts:
```bash
cd slime_plugins/rollout_buffer
bash rollout_buffer_example.sh
Expand Down
2 changes: 1 addition & 1 deletion slime_plugins/rollout_buffer/README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ generator/

### 示例脚本

请仿照 [示例:Qwen3-4B 模型](../../docs/zh/models/qwen3-4B.md) 文档中配置好 slime 的运行环境,下载数据,并转换模型 ckpt。之后分别运行
请仿照 [示例:Qwen3-4B 模型](../../docs/zh/examples/qwen3-4B.md) 文档中配置好 slime 的运行环境,下载数据,并转换模型 ckpt。之后分别运行

```bash
cd slime_plugins/rollout_buffer
Expand Down
Loading