From f3e6b1317ac165102f9584224191d249f9d61c6e Mon Sep 17 00:00:00 2001 From: AnandK27 Date: Wed, 1 Apr 2026 00:25:41 -0700 Subject: [PATCH 1/6] Add prime-rl training cookbook for SimLab trajectories Cookbook that bridges SimLab's task execution with Prime Intellect's prime-rl for RL training of agent models. The full pipeline: - Collect tool-use trajectories from SimLab environments - Convert to SFT datasets (HuggingFace messages format) - Build and push a verifiers environment to Prime Intellect hub - Run hosted RL training via `prime rl run` - Evaluate trained models back through SimLab Includes example customer support tasks, quality+completeness rubrics, and configs for both SFT warmup and RL training on Qwen3.5-9B. Co-Authored-By: Claude Opus 4.6 (1M context) --- cookbook/README.md | 1 + cookbook/prime-rl-training/.gitignore | 9 + cookbook/prime-rl-training/SKILL.md | 161 +++++++++ cookbook/prime-rl-training/configs/rl.toml | 39 ++ cookbook/prime-rl-training/configs/sft.toml | 35 ++ ...pancy_for_enterprise_renewal_35ba835d.json | 37 ++ ..._rate_limiting_issue_and_pre_de0cff0d.json | 54 +++ ...il_group_and_coordinate_acco_7a7bbde0.json | 54 +++ .../prime-envs/simlab_tasks/pyproject.toml | 21 ++ .../prime-envs/simlab_tasks/simlab_tasks.py | 176 +++++++++ .../prime-rl-training/prime-rl-training.md | 237 ++++++++++++ cookbook/prime-rl-training/pyproject.toml | 34 ++ cookbook/prime-rl-training/run_pipeline.sh | 125 +++++++ .../src/prime_rl_training/__init__.py | 1 + .../src/prime_rl_training/collect.py | 152 ++++++++ .../src/prime_rl_training/simlab_env.py | 245 +++++++++++++ .../prime_rl_training/trajectory_converter.py | 340 ++++++++++++++++++ 17 files changed, 1721 insertions(+) create mode 100644 cookbook/prime-rl-training/.gitignore create mode 100644 cookbook/prime-rl-training/SKILL.md create mode 100644 cookbook/prime-rl-training/configs/rl.toml create mode 100644 cookbook/prime-rl-training/configs/sft.toml create mode 100644 cookbook/prime-rl-training/examples/task-bundle/3_triage_and_escalate_critical_billing_discrepancy_for_enterprise_renewal_35ba835d.json create mode 100644 cookbook/prime-rl-training/examples/task-bundle/4_enterprise_client_escalation_resolve_david_parks_api_rate_limiting_issue_and_pre_de0cff0d.json create mode 100644 cookbook/prime-rl-training/examples/task-bundle/6_resolve_sla_critical_billing_dispute_for_wilson_retail_group_and_coordinate_acco_7a7bbde0.json create mode 100644 cookbook/prime-rl-training/prime-envs/simlab_tasks/pyproject.toml create mode 100644 cookbook/prime-rl-training/prime-envs/simlab_tasks/simlab_tasks.py create mode 100644 cookbook/prime-rl-training/prime-rl-training.md create mode 100644 cookbook/prime-rl-training/pyproject.toml create mode 100755 cookbook/prime-rl-training/run_pipeline.sh create mode 100644 cookbook/prime-rl-training/src/prime_rl_training/__init__.py create mode 100644 cookbook/prime-rl-training/src/prime_rl_training/collect.py create mode 100644 cookbook/prime-rl-training/src/prime_rl_training/simlab_env.py create mode 100644 cookbook/prime-rl-training/src/prime_rl_training/trajectory_converter.py diff --git a/cookbook/README.md b/cookbook/README.md index f96677b..4b4f80b 100644 --- a/cookbook/README.md +++ b/cookbook/README.md @@ -28,3 +28,4 @@ The agent will walk through each step, ask you for any required inputs (model, t | [openai-agents-sdk](openai-agents-sdk/) | Customer-style OpenAI Agents SDK cookbook showing how to keep an existing agent app and add a thin SimLab adapter. | | [secure-agent-eval](secure-agent-eval/) | Evaluate agent behavior through OneCLI's credential proxy — compare correctness, audit for credential leakage, and test rate limit resilience. | | [simlab-auto-research](simlab-auto-research/) | Autonomous system prompt optimization using the [auto-research](https://github.com/karpathy/autoresearch) pattern. An outer agent iterates on prompts, measured by SimLab task scores. | +| [prime-rl-training](prime-rl-training/) | Collect SimLab trajectories and train agent models with Prime Intellect's prime-rl (SFT warmup + hosted RL). | diff --git a/cookbook/prime-rl-training/.gitignore b/cookbook/prime-rl-training/.gitignore new file mode 100644 index 0000000..6343811 --- /dev/null +++ b/cookbook/prime-rl-training/.gitignore @@ -0,0 +1,9 @@ +# Generated artifacts (re-create with run_pipeline.sh) +output/ +dataset/ +dist/ +.prime/ +__pycache__/ +*.pyc +taskgen/ +generated-tasks/ diff --git a/cookbook/prime-rl-training/SKILL.md b/cookbook/prime-rl-training/SKILL.md new file mode 100644 index 0000000..06013a1 --- /dev/null +++ b/cookbook/prime-rl-training/SKILL.md @@ -0,0 +1,161 @@ +# Prime-RL Training with SimLab Trajectories + +Train agent models with Prime Intellect's prime-rl using SimLab-collected trajectories. + +## Prerequisites + +Before starting, confirm: + +1. SimLab is installed: `simlab --version` +2. prime CLI is installed: `prime --version` +3. `SIMLAB_COLLINEAR_API_KEY` is set +4. `PRIME_API_KEY` is set +5. `OPENAI_API_KEY` is set (for baseline agent) + +If any prerequisite is missing, tell the user what to set and **wait before proceeding**. + +## Workflow + +### 1. Install cookbook dependencies + +```bash +cd cookbook/prime-rl-training +uv sync +``` + +### 2. Create SimLab environment + +```bash +simlab templates list +``` + +Ask the user which template to use (default: `customer_service`). + +```bash +simlab env init prime-rl-env --template