From a2e14447bf6b4a32cf93e8183be4c59430ef16b6 Mon Sep 17 00:00:00 2001 From: aoshen02 Date: Fri, 26 Jun 2026 07:25:03 +0000 Subject: [PATCH 1/2] docs(examples): list coding_agent_rl in examples/README The examples/coding_agent_rl directory (end-to-end SWE coding-agent RL, added in #2124/#2125) exists but is not listed in examples/README.md. Add it to the Directory Structure list for discoverability. Co-Authored-By: Claude Opus 4.8 (1M context) --- examples/README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/examples/README.md b/examples/README.md index 128b1562d4..337df8bc62 100644 --- a/examples/README.md +++ b/examples/README.md @@ -4,6 +4,7 @@ These examples provide concrete examples to leverage slime in your own RL workfl ## Directory Structure +- **[coding_agent_rl](./coding_agent_rl)**: End-to-end SWE coding-agent RL — a real coding agent (claude-code / codex) edits code in a per-sample sandbox, and the resulting `git diff` is graded against the dataset's test harness. - **[eval_multi_task](./eval_multi_task)**: Example for supporting evaluation multiple tasks with different configs. - **[fully_async](./fully_async)**: Demonstrates fully asynchronous rollout generation for higher efficiency. - **[geo3k_vlm](./geo3k_vlm)**: Training VLMs on a single-turn reasoning task using GRPO on the GEO3K dataset. From cb00cb30d088f0348e61f89ff9c85fdcd7cf85b5 Mon Sep 17 00:00:00 2001 From: aoshen02 Date: Fri, 26 Jun 2026 07:57:52 +0000 Subject: [PATCH 2/2] docs(examples): fix dangling examples/swe_codex link in fully_async README fully_async/README.md points to examples/swe_codex/, which does not exist (the coding-agent example directory is examples/coding_agent_rl/). Fix the reference so the link resolves. Co-Authored-By: Claude Opus 4.8 (1M context) --- examples/fully_async/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/fully_async/README.md b/examples/fully_async/README.md index e65d6db335..e69d9a80aa 100644 --- a/examples/fully_async/README.md +++ b/examples/fully_async/README.md @@ -57,7 +57,7 @@ work unchanged under fully-async: --custom-rm-path your.module.reward # (args, sample | list[Sample]) -> float | list[float] ``` -See `examples/swe_codex/` for a non-trivial example that plugs in a +See `examples/coding_agent_rl/` for a non-trivial example that plugs in a multi-turn agent (Claude Code in a Docker-Proxy sandbox) this way. ## Worker Internals (Very Short)