Background
AReno has agentic examples for shopping and tic-tac-toe, but it does not yet show how to train a coding agent in a realistic multi-turn loop. Coding agents are a natural fit for agentic RL because the policy must inspect files, search code, edit patches, run tests, interpret failures, and iterate until the task is solved.
A first-class coding agentic example would give users a concrete pattern for building tool-using training tasks beyond toy domains. It should be designed carefully around the same workflow a practical coding agent uses, while staying small enough to run as an example.
Scope
Add an agentic coding example that trains a model through multi-turn software-engineering tasks.
The example should include:
- a small task dataset with repo-local coding tasks, expected outcomes, and test commands.
- a multi-turn agent loop that can call one tool per turn and append tool results back into the conversation.
- useful coding tools modeled after Codex-style workflows, such as:
- list files / inspect tree
- read file snippets
- search with ripgrep-style queries
- apply unified patches
- run a bounded shell command or test command
- report final answer / completion status
- a reward function that scores task success from test results and optionally patch quality signals.
- trajectory construction that returns explicit agentic samples without relying on proxy-side prompt matching.
- clear safeguards for command execution, timeouts, path allowlists, and output truncation.
- documentation showing how to run the example with areno train and how to interpret reward/log output.
Design requirements
The example should prefer realistic coding-agent mechanics over a scripted oracle:
- Tools should expose constrained capabilities, not direct access to ground-truth answers.
- The agent should be able to recover from failed tests by reading output, editing again, and rerunning.
- Tool outputs should be compact and deterministic enough for stable training.
- Dataset tasks should be small, CPU-friendly, and not require network access.
- The example should avoid destructive filesystem operations and should isolate each task workspace.
Acceptance criteria
- A new agentic coding example exists under examples/agentic/ or another documented examples location.
- The agent loop performs multiple model calls for one sample and records a combined trajectory.
- The tool set includes file inspection, search, patch application, and test execution or equivalent bounded commands.
- The reward function can identify success/failure without starting GPU/backend-heavy work.
- CPU tests cover tool behavior, trajectory construction, and at least one successful toy coding task.
- README/docs mention the example and show the minimal command to run it.
- Error messages for malformed task specs, unsafe paths, invalid patches, and failing commands are clear.
Activity
Background
AReno has agentic examples for shopping and tic-tac-toe, but it does not yet show how to train a coding agent in a realistic multi-turn loop. Coding agents are a natural fit for agentic RL because the policy must inspect files, search code, edit patches, run tests, interpret failures, and iterate until the task is solved.
A first-class coding agentic example would give users a concrete pattern for building tool-using training tasks beyond toy domains. It should be designed carefully around the same workflow a practical coding agent uses, while staying small enough to run as an example.
Scope
Add an agentic coding example that trains a model through multi-turn software-engineering tasks.
The example should include:
Design requirements
The example should prefer realistic coding-agent mechanics over a scripted oracle:
Acceptance criteria
Activity