THUDM · CalvinXKY · Jun 27, 2026
diff --git a/examples/README.md b/examples/README.md
@@ -8,11 +8,11 @@ These examples provide concrete examples to leverage slime in your own RL workfl
 - **[fully_async](./fully_async)**: Demonstrates fully asynchronous rollout generation for higher efficiency.
 - **[geo3k_vlm](./geo3k_vlm)**: Training VLMs on a single-turn reasoning task using GRPO on the GEO3K dataset.
 - **[geo3k_vlm_multi_turn](./geo3k_vlm_multi_turn)**: VLM multi-turn training on Geo3k dataset.
-- **[low_precision](./low_precision)**: Examples of FP8 training and inference for improved throughput and stability.
+- **[low_precision](../docs/en/advanced/low-precision.md)**: Examples of FP8 training and inference for improved throughput and stability.
 - **[multi_agent](./multi_agent)**: Example of running multi-agent RL with `slime`.
 - **[on_policy_distillation](./on_policy_distillation)**: Example implementation for on-policy distillation, extending the reinforcement learning pipeline to support teacher–student distillation directly within on-policy training.
 - **[delta_weight_sync](./delta_weight_sync)**: Non-colocated weight sync that ships only changed positions + values over disk (training/inference disaggregation) or NCCL.
-- **[reproducibility](./reproducibility)**: Guides on achieving bitwise experiment reproduction using deterministic modes.
+- **[reproducibility](../docs/en/advanced/reproducibility.md)**: Guides on achieving bitwise experiment reproduction using deterministic modes.
 - **[retool](./retool)**: Demonstrates the retool functionality for tool-enabled language model generation.
 - **[search-r1](./search-r1)**: A minimal reproduction of Search-R1, featuring multi-turn conversation and tool-calling.
 - **[strands_sglang](./strands_sglang)**: Integration example with the Strands-Agents scaffolding framework.

diff --git a/examples/delta_weight_sync/README.md b/examples/delta_weight_sync/README.md
@@ -48,20 +48,15 @@ See [docs/en/advanced/delta-weight-sync.md](../../docs/en/advanced/delta-weight-
 
 ## Results
 
-W&B traces comparing delta sync against the full-sync baseline on GLM-4.7-355B-A32B / DAPO-Math-17k.
+W&B traces comparing delta sync against the full-sync baseline on GLM-4.7-355B-A32B / DAPO-Math-17k track:
 
-![Raw reward](./raw_reward.png)
-
-![Train/rollout logprob abs diff](./train_rollout_logprob_abs_diff.png)
-
-![Update weights time](./update_weights_time.png)
+- `raw_reward` — training reward curve vs full-sync baseline
+- `train/train_rollout_logprob_abs_diff` — token-level logprob mismatch between train and rollout
+- `perf/update_weights_time` — wall time per weight sync
+- `perf/update_weights_density` — fraction of weight positions that moved between consecutive syncs (sync 0 omitted: snapshot-seeding pass with density = 1.0)
 
 > **Note on the small curve-to-curve gap.** RL training is inherently non-deterministic (cuBLAS reductions, FlashAttention split-K, NCCL all-reduce ordering, dynamic-batch token assignment). Two identically-configured *full*-sync runs would diverge the same way. Delta sync's selective overwrite is bit-exact with full sync per step (no arithmetic, no drift); the trajectory matches, the bits don't.
 
-![Update weights density](./update_weights_density.png)
-
-*Per-sync change density (`perf/update_weights_density`) — fraction of weight positions that moved between consecutive syncs. Sync 0 is omitted: it's the snapshot-seeding pass with density = 1.0, which would compress the y-axis.*
-
 ## Why these encoding defaults
 
 Per-sync change density during RL fine-tuning at conservative LRs sits around **2-3%** ([arXiv:2602.03839](https://arxiv.org/pdf/2602.03839) reports ~1% on a related setup; we measured ~2-3% on this run). Below the 3.125% break-even point, gap-encoded positions are smaller than absolute indices — the disk default `deltas_zstd` adds zstd L1 on top to squeeze the gap byte stream further (~35-40%), which is the right tradeoff when shared-FS bandwidth is ≤ 300 MB/s. Intra-datacenter NCCL has no bandwidth pressure, so `indices` (lowest compute, biggest payload) is the cleaner default there.
diff --git a/slime_plugins/rollout_buffer/README.md b/slime_plugins/rollout_buffer/README.md
@@ -40,7 +40,7 @@ In addition, Rollout Buffer also provides some customizable functions to meet sp
 
 ### Example Script
 
-First, you need to follow [Example: Qwen3-4B Model](../../docs/en/models/qwen3-4B.md) to configure the environment, download data and convert model checkpoints. And then run the following scripts:
+First, you need to follow [Example: Qwen3-4B Model](../../docs/en/examples/qwen3-4B.md) to configure the environment, download data and convert model checkpoints. And then run the following scripts:
 ```bash
 cd slime_plugins/rollout_buffer
 bash rollout_buffer_example.sh

diff --git a/slime_plugins/rollout_buffer/README_zh.md b/slime_plugins/rollout_buffer/README_zh.md
@@ -40,7 +40,7 @@ generator/
 
 ### 示例脚本
 
-请仿照 [示例：Qwen3-4B 模型](../../docs/zh/models/qwen3-4B.md) 文档中配置好 slime 的运行环境，下载数据，并转换模型 ckpt。之后分别运行
+请仿照 [示例：Qwen3-4B 模型](../../docs/zh/examples/qwen3-4B.md) 文档中配置好 slime 的运行环境，下载数据，并转换模型 ckpt。之后分别运行
 
 ```bash
 cd slime_plugins/rollout_buffer