Add a multi-turn coding agentic training example

## Background

AReno has agentic examples for shopping and tic-tac-toe, but it does not yet show how to train a coding agent in a realistic multi-turn loop. Coding agents are a natural fit for agentic RL because the policy must inspect files, search code, edit patches, run tests, interpret failures, and iterate until the task is solved.

A first-class coding agentic example would give users a concrete pattern for building tool-using training tasks beyond toy domains. It should be designed carefully around the same workflow a practical coding agent uses, while staying small enough to run as an example.

---

## Scope

Add an agentic coding example that trains a model through multi-turn software-engineering tasks.

The example should include:

- a small task dataset with repo-local coding tasks, expected outcomes, and test commands.
- a multi-turn agent loop that can call one tool per turn and append tool results back into the conversation.
- useful coding tools modeled after Codex-style workflows, such as:
  - list files / inspect tree
  - read file snippets
  - search with ripgrep-style queries
  - apply unified patches
  - run a bounded shell command or test command
  - report final answer / completion status
- a reward function that scores task success from test results and optionally patch quality signals.
- trajectory construction that returns explicit agentic samples without relying on proxy-side prompt matching.
- clear safeguards for command execution, timeouts, path allowlists, and output truncation.
- documentation showing how to run the example with areno train and how to interpret reward/log output.

---

## Design requirements

The example should prefer realistic coding-agent mechanics over a scripted oracle:

- Tools should expose constrained capabilities, not direct access to ground-truth answers.
- The agent should be able to recover from failed tests by reading output, editing again, and rerunning.
- Tool outputs should be compact and deterministic enough for stable training.
- Dataset tasks should be small, CPU-friendly, and not require network access.
- The example should avoid destructive filesystem operations and should isolate each task workspace.

---

## Acceptance criteria

- A new agentic coding example exists under examples/agentic/ or another documented examples location.
- The agent loop performs multiple model calls for one sample and records a combined trajectory.
- The tool set includes file inspection, search, patch application, and test execution or equivalent bounded commands.
- The reward function can identify success/failure without starting GPU/backend-heavy work.
- CPU tests cover tool behavior, trajectory construction, and at least one successful toy coding task.
- README/docs mention the example and show the minimal command to run it.
- Error messages for malformed task specs, unsafe paths, invalid patches, and failing commands are clear.

---

## Activity

- [ ] Design the task schema and workspace isolation model.
- [ ] Implement coding tools with strict path and timeout controls.
- [ ] Implement the multi-turn run_agent loop and trajectory return path.
- [ ] Add a small deterministic dataset of coding tasks.
- [ ] Add reward logic based on test success and final submission.
- [ ] Add CPU tests for tools, loop behavior, and reward outcomes.
- [ ] Document how to run the example and what success looks like.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a multi-turn coding agentic training example #24

Background

Scope

Design requirements

Acceptance criteria

Activity

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Add a multi-turn coding agentic training example #24

Description

Background

Scope

Design requirements

Acceptance criteria

Activity

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions