Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,10 @@ slime 被当作 RL 基础设施来开发,因为“脚本能跑起来”远远

这些项目不只是 demo。它们是把 slime 作为可复用 RL substrate 的独立系统,覆盖生产级 post-training、agentic RL、domain RL 和 rollout-system research。

### 🐎 Dressage:面向任意 Agent 与 Sandbox 的可扩展 RL

[**Dressage**](https://github.com/Accio-Lab/Dressage) 是 [Alibaba Accio](https://www.accio.com/work?im_ref=1O8wgT3poxyZWCj31F1ZJ0fNUkuTK6x9ZTHw0Y0&sharedid=&im_pid=5619512&im_pname=AI%20INTRO%20COPORATE) 基于 slime 构建的 agentic RL training framework,面向 blackbox agents(例如 [OpenCode](https://github.com/anomalyco/opencode)、[OpenClaw](https://github.com/openclaw/openclaw))以及任意 sandbox environment(例如 [bwrap](https://github.com/containers/bubblewrap)、[E2B](https://github.com/e2b-dev/e2b)、Kubernetes)提供统一 RL。它通过 Paddock、Sandbox 和 Proxy 层解耦交互语义、执行位置与 token-level trajectory capture,在不重写 agent 内部循环的情况下适配 agent workflow。Dressage 会记录 token-wise logprobs、loss masks、weight versions 和 MoE routing,并使用 TITO 与 segment-aware training 将长程 tool interaction 转换为稳定的 RL samples。

### ⛵ Miles:面向大规模模型训练的企业级强化学习框架

[Miles](https://github.com/radixark/miles) 是 [RadixArk](https://github.com/radixark) 基于 slime 构建的大模型 RL 后训练框架。它与 slime 上游开发保持紧密同步,同时在此基础上针对企业场景做了一系列扩展:更深度的 [SGLang](https://github.com/sgl-project/sglang) 集成、配套的运维与部署工具和服务,以及针对[新模型](https://www.radixark.com/miles/docs/models)和[新硬件](https://www.radixark.com/miles/docs/platforms)的优化。Miles 也在持续围绕真实生产环境需求迭代和进化,例如加入对 LoRA、TITO、低精度训练的支持。
Expand Down
Loading