Fork of allenai/open-instruct adapted for multi-node GRPO (Group Relative Policy Optimization) reinforcement learning on the Isambard GH200 cluster.
We run RL with Verifiable Rewards (RLVR) using GRPO to train language models on math and code tasks. The system uses Ray to orchestrate DeepSpeed learners and vLLM inference engines across multiple nodes, with Gloo-based weight synchronization.
What we changed from upstream: 6 commits, ~1058 insertions. All changes are infrastructure (configs, env var filtering, wandb defaults) — no core training logic was modified.
sbatch configs/isambard/grpo_rlzero.sbatch configs/isambard/grpo_debug_single_node.shtail -f /projects/a5k/public/logs_puria.a5k/open-instruct/<job_id>.outEdit grpo_rlzero.sbatch:
#SBATCH --nodes=2Each node contributes 1 learner + 3 vLLM engines = 4 GPUs.
| Config | Purpose |
|---|---|
configs/isambard/grpo_rlzero.sbatch |
SLURM job script: Ray cluster, env setup, job chaining |
configs/isambard/grpo_debug_single_node.sh |
Debug run: Qwen2.5-0.5B, 100 episodes |
configs/isambard/grpo_7b_rlzero_general.sh |
Production run: 7B model, math dataset |
configs/isambard/ray_node_setup_slurm.sh |
Ray worker node setup (called by sbatch) |
configs/isambard/setup_open_instruct_env.sh |
One-time environment setup |
configs/isambard/run_on_compute.sbatch |
Interactive compute node access |
Runs are tracked in the geodesic/geodesic-grpo project. Enable with --with_tracking in the training config (enabled by default in debug configs).
Node 0 Node 1
┌──────────────────────┐ ┌──────────────────────┐
│ GPU 0: Learner (DS) │ │ GPU 0: Learner (DS) │
│ GPU 1: vLLM Engine 0 │ │ GPU 1: vLLM Engine 0 │
│ GPU 2: vLLM Engine 1 │ │ GPU 2: vLLM Engine 1 │
│ GPU 3: vLLM Engine 2 │ │ GPU 3: vLLM Engine 2 │
└──────────────────────┘ └──────────────────────┘
│ │
└───── Ray Cluster + Gloo ──────┘
See docs/architecture.md for a thorough educational guide covering the training loop, weight sync, placement groups, and GRPO loss.
# Install
uv sync
# Lint + format
make style && make quality
# Test
uv run pytestThis is a fork of allenai/open-instruct. To pull upstream changes:
git remote add upstream https://github.com/allenai/open-instruct.git
git fetch upstream
git merge upstream/mainApache 2.0 — see LICENSE.