Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 42 additions & 3 deletions .agents/skills/release-prep/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,44 @@ Use `grep -rn "LAST_VERSION"` to find any other version-pinned references.

---

## Step 4 — Verify subcommand tables are current
## Step 4 — Update ROADMAP.md

Open `ROADMAP.md` and apply these changes:

1. **Mark shipped items.** For each feature or item in the roadmap that was
delivered in the commits since LAST_TAG, strike through the bullet with
`~~text~~` and append `— shipped in NEXT_VERSION (issue #N if known)`.
Use the commit subjects and the CHANGELOG section you just wrote as the
source of truth.

2. **Strike the outbound HTTP tool resilience block** if it is present
verbatim and has now shipped (it shipped in v0.3.0 as issues #7–#9).
Replace the prose block with a single struck-through line:
`- ~~Outbound HTTP tool resilience (retry, DLQ, per-host rate limits)~~ — shipped in NEXT_VERSION (#7, #8, #9).`

3. **Add a new version section** for the _next_ planned minor if none
exists, or leave existing planned work in place if a forward-looking
section is already present.

4. Update the `_Last reviewed:` footer to TODAY.

---

## Step 5 — Update SECURITY.md

Open `SECURITY.md` and update the **Supported Versions** table:

1. Add a new row for `NEXT_VERSION_MINOR.x` (the X.Y portion of
NEXT_VERSION) with `:white_check_mark:`.
2. If LAST_VERSION was on a different minor line (e.g. LAST_TAG = v0.2.x
and NEXT_VERSION = v0.3.0), mark the old minor line as `:x:` (end of
support) — leather supports only the current minor line.
3. If NEXT_VERSION is a patch on the same minor (e.g. v0.2.1 on v0.2.x),
no row change is needed; the existing row already covers it.

---

## Step 6 — Verify subcommand tables are current

Confirm that every `Run*` function in `internal/cli/cli.go` has a corresponding
row in each of these tables:
Expand All @@ -80,14 +117,14 @@ If any row is missing, add it before committing.

---

## Step 5 — Commit and push
## Step 7 — Commit and push

Stay on the **current branch** — do not switch to or push directly to `main`.
Stage all changed files and create one commit:

```
CURRENT_BRANCH=$(git branch --show-current)
git add CHANGELOG.md README.md docs/ .subagents/
git add CHANGELOG.md README.md ROADMAP.md SECURITY.md docs/ .subagents/
git commit -m "chore(release): prepare NEXT_VERSION"
git push origin "$CURRENT_BRANCH"
```
Expand All @@ -108,6 +145,8 @@ Do not tag in this step. Tagging is the job of `leather-release-tag`.
- [ ] NEXT_VERSION is set and justified
- [ ] CHANGELOG has the new section with at least one bullet
- [ ] No stale version string remains in docs (grep clean)
- [ ] ROADMAP.md: shipped items struck through; `_Last reviewed:` updated
- [ ] SECURITY.md: Supported Versions table reflects NEXT_VERSION
- [ ] Subcommand tables are in sync
- [ ] Commit is pushed to current branch (not directly to main)
- [ ] PR is open targeting main (create one if it doesn't exist)
Expand Down
4 changes: 4 additions & 0 deletions .subagents/AGENTS-OBSERVABILITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,10 @@ metric requires a dashboard update note in the PR description.
| `leather_cache_hits_total` | counter | `kind` | Response-cache hits. |
| `leather_cache_misses_total` | counter | `kind` | Response-cache misses. |
| `leather_build_info` | gauge=1 | `version`, `commit` | Build metadata. |
| `leather_tool_retry_total` | counter | _(none)_ | Tool call attempts beyond the first; tells the operator how often transient failures occur across all tools. |
| `leather_tool_backoff_total` | counter | _(none)_ | Times a backoff sleep was applied (retry-after or exponential); indicates rate-limiting pressure from upstream services. |
| `leather_tool_rate_limit_wait_total` | counter | _(none)_ | Times a tool call waited for a per-host token-bucket token; nonzero means the configured rate limits are actively throttling traffic. |
| `leather_outbound_dlq_depth` | gauge | _(none)_ | Current item count in `outbound-dlq`; nonzero means tool failures need operator attention (`leather dlq inspect`). |

Rules:

Expand Down
15 changes: 15 additions & 0 deletions .subagents/AGENTS-QUALITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -271,6 +271,21 @@ Before opening a PR:
- [ ] No real credentials or secrets in `testdata/`
- [ ] CI workflow action refs are SHA-pinned with version comments

### Outbound tool resilience (retry, DLQ, rate limits)

PRs touching `internal/tool`, `internal/config`, or `internal/cli/cmd_dlq.go`:

- [ ] `go test ./internal/tool/... ./internal/cli/...` passes
- [ ] `go test -race ./internal/tool/... ./internal/cli/...` passes
- [ ] New tools with `retry:` config have tests covering transient retry and permanent no-retry paths
- [ ] `TestExecute_DLQEnqueueOnExhaustion` and `TestExecute_DLQEnqueueOnPermanent` pass
- [ ] `TestHostLimiter_*` suite passes; no real network calls
- [ ] `TestRunDLQ*` suite passes with `t.TempDir()` state dirs
- [ ] `leather dlq inspect` output includes ID, tool name, agent, attempt, error
- [ ] `leather dlq requeue --state-dir ... <item-id>` (item-id **last**) moves item
- [ ] `tools.rate_limits` in config.yaml parses without error; bad spec is warn+disable, not panic
- [ ] `/metrics` response contains `leather_tool_retry_total`, `leather_tool_backoff_total`, `leather_tool_rate_limit_wait_total`, `leather_outbound_dlq_depth`

---

_Last reviewed: 2026-06-05_
58 changes: 57 additions & 1 deletion .subagents/AGENTS-RUNTIME.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,13 +116,60 @@ func (r *Registry) GetTools(skillNames []string) []model.ToolDefinition
func (r *Registry) ResolveTools(skillNames, toolsetNames, toolNames []string) []model.ToolDefinition

// Executor dispatches HTTP- and MCP-backed tools.
type Executor struct { MCP *mcp.Registry }
// QueueMgr enables the outbound DLQ: permanent/exhausted failures are
// enqueued to "outbound-dlq" when this field is non-nil.
// Limiter enforces per-host token-bucket rate limits before each attempt.
type Executor struct {
MCP *mcp.Registry
QueueMgr *queue.Manager // nil = outbound DLQ disabled
AgentName string // injected by Runner; stored in DLQ items
Limiter *HostLimiter // nil = no rate limiting
}

// Execute runs a single tool call. Always returns a ToolResult; never panics.
// On failure, ToolResult.Error is set and Content may be empty.
// With def.Retry.MaxAttempts > 0, transient failures (5xx, 429, network)
// are retried with exponential backoff + jitter.
func (e *Executor) Execute(ctx context.Context, def model.ToolDefinition, args map[string]any) model.ToolResult

// MetricSnapshot returns a point-in-time snapshot of outbound tool counters.
// Counters are process-lifetime atomics; they reset only on restart.
func MetricSnapshot() (retryTotal, backoffTotal, rateLimitWaitTotal int64)
```

#### Per-tool retry policy

`ToolDefinition.Retry` (`model.ToolRetryConfig`) controls retry behaviour:

| Field | Default | Meaning |
|---|---|---|
| `max_attempts` | 1 (no retry) | Total attempts including the initial one. |
| `base_delay` | `1s` | Initial backoff; doubles each attempt. |
| `max_delay` | `30s` | Backoff ceiling before jitter. |
| `honor_retry_after` | `true` | Use `Retry-After` header value when present. |

A zero `ToolRetryConfig` (all fields at zero value) means **single attempt, no
retry** — preserving backward compatibility for tools that predate the policy.
Only tools with an explicit `retry:` block in their skill YAML get the retry loop.

`isTransient` classifies the failure to decide whether to retry:
- **Transient** → retry: 429, 500, 502, 503, 504; network/timeout errors;
403 with `X-RateLimit-Remaining: 0`
- **Permanent** → return immediately without retrying: all other 4xx

#### Outbound DLQ

When `Executor.QueueMgr != nil` and a tool call either:
- exhausts its retry budget on transient errors, or
- fails immediately with a permanent error,

a `model.QueueItem` is enqueued to the well-known `"outbound-dlq"` queue.
DLQ items carry `ToolName`, `ToolTarget`, and the last error in `Payload`.
DLQ enqueue is a fire-and-forget side-effect; failure to enqueue is logged
at `warn` and does not affect the returned `ToolResult`.

Items in `outbound-dlq` can be inspected and requeued via `leather dlq`.

#### Skill file format (`*.skill.yaml`)

```yaml
Expand All @@ -139,6 +186,11 @@ tools:
headers:
Authorization: "Bearer {{env:GITHUB_TOKEN}}"
Accept: application/vnd.github+json
retry:
max_attempts: 3
base_delay: 1s
max_delay: 30s
honor_retry_after: true
```

#### Toolset file format (`*.toolset.yaml`)
Expand All @@ -161,11 +213,15 @@ every toolset reference points at an already-known tool.
- `mcp` → `execMCP` through a started `mcp.Registry`

HTTP execution (`execHTTP`):
- Calls `e.Limiter.Wait(ctx, host)` before each attempt; blocks until the
per-host token bucket allows the request or ctx is cancelled.
- Expands URL template with tool call arguments (`{{.field}}`).
- Expands `{{env:VAR}}` in header values; **never logs auth header values**.
- Sends the request with the runner's context (inherits timeout).
- Response body is capped at 1 MB.
- Non-2xx responses populate `ToolResult.Error` with the status code message.
- Transient failures are retried up to `def.Retry.MaxAttempts` times with
exponential backoff; permanent failures return immediately.

MCP execution (`execMCP`):
- Looks up the named server in the running registry.
Expand Down
31 changes: 31 additions & 0 deletions .subagents/AGENTS-SERVE.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ Each subcommand has:
| `run` | `RunOnce` | Load and execute a single agent, then exit |
| `validate` | `RunValidate` | Parse and validate agent files; report errors |
| `test-agent` | `RunTestAgent` | Execute an agent with `MockLLM` and print the turn transcript |
| `dlq` | `RunDLQ` | Inspect and requeue outbound dead-letter queue items |
| `status` | `RunStatus` | Print scheduler state, job history, token usage |
| `ingest` | `RunIngest` | Store raw bytes as a hide and optionally enqueue for curing |
| `snapshot` | `RunSnapshot` → `RunSnapshotSave` / `RunSnapshotRestore` | Save or restore a `tar.gz` point-in-time archive of runtime state |
Expand Down Expand Up @@ -212,8 +213,19 @@ cache_dir: "" # empty = serve uses <state_dir>/cache
mcp_servers_file: "" # empty = run/serve/validate use ~/.leather/mcp-servers.yaml
loop: 1 # repeat leather run N times
tannery: "" # path to tannery.yaml; empty = tannery disabled

tools:
rate_limits: # per-host token-bucket rate limits for outbound tool calls
api.github.com: "60/m" # format: "N/s", "N/m", or "N/h"
api.example.com: "10/s"
```

`tools.rate_limits` is a nested map. Each key is a hostname (no port, no
scheme); the value is a rate spec in the form `N/<unit>` where unit is `s`
(seconds), `m` (minutes), or `h` (hours). The second call to the same host
within the interval blocks until the next token is available. Unknown hosts
pass through immediately with no limiting.

YAML keys are the snake_case equivalents of the flag names (strip `--`,
replace `-` with `_`).

Expand Down Expand Up @@ -255,6 +267,24 @@ All endpoints live in `internal/cli/api_tannery.go`.
| `/curings` | GET | `handleCurings` | List curing definitions with queue depth |
| `/intake` | POST | `handleIntake` | Direct-ingest endpoint; writes hide from request body |

### `leather dlq` subcommand (`internal/cli/cmd_dlq.go`)

`RunDLQ(args, stdout, stderr)` dispatches `inspect` and `requeue`:

```
leather dlq inspect [--queue outbound-dlq] [--state-dir ...]
leather dlq requeue [--queue outbound-dlq] [--work-queue <name>] [--state-dir ...] <item-id>
```

- **`inspect`** — lists all items in the DLQ; prints `ID | tool | agent | attempt | enqueued_at | error`.
- **`requeue`** — moves the named item from the DLQ to `--work-queue`, resetting
`AttemptCount` to 0 so it gets a fresh retry budget. Default `--work-queue` is
the DLQ name with the `-dlq` suffix stripped.

**Important**: `<item-id>` must come **after** all flags. Go's `flag.FlagSet`
stops parsing at the first non-flag token, so placing `<item-id>` before flags
silently ignores the remaining flags.

### `leather ingest` subcommand (`internal/cli/cmd_ingest.go`)

`RunIngest(args, stdin, stdout, stderr)` — reads body from `--file` or stdin,
Expand Down Expand Up @@ -378,6 +408,7 @@ internal/schema → internal/config
| Flag name doesn't match env var | `--flag-name` → `LEATHER_FLAG_NAME`; check both |
| Skipping graceful shutdown | Always call `scheduler.Drain` before returning from `RunServe` |
| Flags after positional arg in `leather run` | Go's `flag.FlagSet` stops at the first non-flag token. The agent file path must come **last**: `leather run --config=... --var k=v agent.md` — not `leather run agent.md --config ...` |
| `<item-id>` before flags in `leather dlq requeue` | Same issue: item-id must be **last** after all flags: `leather dlq requeue --state-dir ... <item-id>` |
| `leather init` overwriting without `--overwrite` | `RunInit` fails closed: any pre-existing file causes a non-zero exit and reports `--overwrite` hint. Never silently clobber. |
| Calling `RunValidate` from `RunInit` for post-write validation | `RunValidate` performs a full semantic check including model resolution (fails without `LEATHER_MODEL`). `RunInit` uses schema-only validation (`runInitValidate`) which is syntax-only and does not require a model to be set. |

Expand Down
23 changes: 23 additions & 0 deletions .subagents/AGENTS-TOOLS-SKILLS-TOOLSETS.md
Original file line number Diff line number Diff line change
Expand Up @@ -218,6 +218,29 @@ binary for shell commands; an HTTP MCP server for HTTP APIs).
- The capability is a coherent verb (e.g. `release-write`,
`inbox-triage`), not a grab bag.

#### Adding retry policy to a skill tool

Tools that call remote APIs should declare a `retry:` block to handle transient
failures (5xx, 429, network errors) without relying on the caller to retry:

```yaml
tools:
- name: github_list_issues
type: http
http:
url: "https://api.github.com/repos/{{.repo}}/issues"
retry:
max_attempts: 3 # initial attempt + 2 retries
base_delay: 1s # doubles each retry; capped at max_delay
max_delay: 30s
honor_retry_after: true # use Retry-After header when present
```

Omitting `retry:` (or setting `max_attempts: 1`) preserves the legacy
single-attempt behaviour — no retries, no backoff. Only transient errors
(5xx, 429, network timeouts) trigger retries; permanent 4xx errors return
immediately regardless of the retry config.

### When to write a per-turn declaration

- The turn does something dangerous and the agent's base scope is too
Expand Down
29 changes: 29 additions & 0 deletions .subagents/AGENTS-WORKER.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,35 @@ re-serialize the full slice. Files are mode 0600; directories mode 0700.
- `Payload` — `map[string]any`; template variables available in agent prompts
- `EnqueuedAt` — Unix timestamp
- `AttemptCount` — incremented by the runner on each dequeue-and-run attempt
- `ToolName` — non-empty for outbound DLQ items; the failed tool's name
- `ToolTarget` — non-empty for outbound DLQ items; the URL or `server/tool` string

#### Outbound DLQ (`outbound-dlq`)

Tool execution failures that are permanent or that exhaust their retry budget
are enqueued to the well-known `outbound-dlq` queue by `tool.Executor`. These
items are **not** processed by `internal/curing` workers (`CuringName == ""`).
They are surfaced for operator inspection and manual requeue via `leather dlq`.

Outbound DLQ item shape:

| Field | Meaning |
|---|---|
| `ID` | `odlq_<date>_<time>_<hex>` |
| `AgentName` | Agent that triggered the tool call |
| `ToolName` | Name of the failed tool |
| `ToolTarget` | URL (HTTP tools) or `server/tool` (MCP tools) |
| `AttemptCount` | Number of attempts made before giving up |
| `EnqueuedAt` | Unix timestamp |
| `Payload["tool"]` | Tool name (duplicate for backward compat) |
| `Payload["args"]` | Tool arguments at time of failure |
| `Payload["error"]` | Last error message |
| `Payload["attempt"]` | Attempt count at time of failure |

**DLQ enqueue path**: `tool.Executor.Execute` → final attempt fails or
`isTransient=false` → `QueueMgr.Enqueue("outbound-dlq", item)`. Enqueue
failure is logged at `warn` and does not affect the `ToolResult` returned to
the runner. DLQ is disabled when `Executor.QueueMgr` is nil.

---

Expand Down
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@
[![Go Reference](https://pkg.go.dev/badge/github.com/tgpski/leather.svg)](https://pkg.go.dev/github.com/tgpski/leather)
[![Go Version](https://img.shields.io/github/go-mod/go-version/TGPSKI/leather?v=2)](https://go.dev/)
[![License](https://img.shields.io/github/license/TGPSKI/leather?v=2)](LICENSE)
[![GitHub Release](https://img.shields.io/github/release/TGPSKI/leather.svg?style=flat)](https://github.com/TGPSKI/leather/releases)
[![CI](https://github.com/TGPSKI/leather/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/TGPSKI/leather/actions/workflows/ci.yml)



**Local agent infrastructure in one stdlib-only Go binary.**

Expand Down
2 changes: 2 additions & 0 deletions internal/cli/cli.go
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ func Run(args []string, stdout, stderr io.Writer, version, commit string) int {
return RunReplay(rest, stdout, stderr, version, commit)
case "snapshot":
return RunSnapshot(rest, stdout, stderr)
case "dlq":
return RunDLQ(rest, stdout, stderr)
case "attach":
return RunAttach(rest, stdout, stderr)
case "help", "--help", "-h":
Expand Down
Loading