diff --git a/.agents/skills/release-prep/SKILL.md b/.agents/skills/release-prep/SKILL.md index 945c002..2060f9a 100644 --- a/.agents/skills/release-prep/SKILL.md +++ b/.agents/skills/release-prep/SKILL.md @@ -66,7 +66,44 @@ Use `grep -rn "LAST_VERSION"` to find any other version-pinned references. --- -## Step 4 — Verify subcommand tables are current +## Step 4 — Update ROADMAP.md + +Open `ROADMAP.md` and apply these changes: + +1. **Mark shipped items.** For each feature or item in the roadmap that was + delivered in the commits since LAST_TAG, strike through the bullet with + `~~text~~` and append `— shipped in NEXT_VERSION (issue #N if known)`. + Use the commit subjects and the CHANGELOG section you just wrote as the + source of truth. + +2. **Strike the outbound HTTP tool resilience block** if it is present + verbatim and has now shipped (it shipped in v0.3.0 as issues #7–#9). + Replace the prose block with a single struck-through line: + `- ~~Outbound HTTP tool resilience (retry, DLQ, per-host rate limits)~~ — shipped in NEXT_VERSION (#7, #8, #9).` + +3. **Add a new version section** for the _next_ planned minor if none + exists, or leave existing planned work in place if a forward-looking + section is already present. + +4. Update the `_Last reviewed:` footer to TODAY. + +--- + +## Step 5 — Update SECURITY.md + +Open `SECURITY.md` and update the **Supported Versions** table: + +1. Add a new row for `NEXT_VERSION_MINOR.x` (the X.Y portion of + NEXT_VERSION) with `:white_check_mark:`. +2. If LAST_VERSION was on a different minor line (e.g. LAST_TAG = v0.2.x + and NEXT_VERSION = v0.3.0), mark the old minor line as `:x:` (end of + support) — leather supports only the current minor line. +3. If NEXT_VERSION is a patch on the same minor (e.g. v0.2.1 on v0.2.x), + no row change is needed; the existing row already covers it. + +--- + +## Step 6 — Verify subcommand tables are current Confirm that every `Run*` function in `internal/cli/cli.go` has a corresponding row in each of these tables: @@ -80,14 +117,14 @@ If any row is missing, add it before committing. --- -## Step 5 — Commit and push +## Step 7 — Commit and push Stay on the **current branch** — do not switch to or push directly to `main`. Stage all changed files and create one commit: ``` CURRENT_BRANCH=$(git branch --show-current) -git add CHANGELOG.md README.md docs/ .subagents/ +git add CHANGELOG.md README.md ROADMAP.md SECURITY.md docs/ .subagents/ git commit -m "chore(release): prepare NEXT_VERSION" git push origin "$CURRENT_BRANCH" ``` @@ -108,6 +145,8 @@ Do not tag in this step. Tagging is the job of `leather-release-tag`. - [ ] NEXT_VERSION is set and justified - [ ] CHANGELOG has the new section with at least one bullet - [ ] No stale version string remains in docs (grep clean) +- [ ] ROADMAP.md: shipped items struck through; `_Last reviewed:` updated +- [ ] SECURITY.md: Supported Versions table reflects NEXT_VERSION - [ ] Subcommand tables are in sync - [ ] Commit is pushed to current branch (not directly to main) - [ ] PR is open targeting main (create one if it doesn't exist) diff --git a/.subagents/AGENTS-OBSERVABILITY.md b/.subagents/AGENTS-OBSERVABILITY.md index 1d27bfc..45ee29c 100644 --- a/.subagents/AGENTS-OBSERVABILITY.md +++ b/.subagents/AGENTS-OBSERVABILITY.md @@ -176,6 +176,10 @@ metric requires a dashboard update note in the PR description. | `leather_cache_hits_total` | counter | `kind` | Response-cache hits. | | `leather_cache_misses_total` | counter | `kind` | Response-cache misses. | | `leather_build_info` | gauge=1 | `version`, `commit` | Build metadata. | +| `leather_tool_retry_total` | counter | _(none)_ | Tool call attempts beyond the first; tells the operator how often transient failures occur across all tools. | +| `leather_tool_backoff_total` | counter | _(none)_ | Times a backoff sleep was applied (retry-after or exponential); indicates rate-limiting pressure from upstream services. | +| `leather_tool_rate_limit_wait_total` | counter | _(none)_ | Times a tool call waited for a per-host token-bucket token; nonzero means the configured rate limits are actively throttling traffic. | +| `leather_outbound_dlq_depth` | gauge | _(none)_ | Current item count in `outbound-dlq`; nonzero means tool failures need operator attention (`leather dlq inspect`). | Rules: diff --git a/.subagents/AGENTS-QUALITY.md b/.subagents/AGENTS-QUALITY.md index 5a398bb..49e39cb 100644 --- a/.subagents/AGENTS-QUALITY.md +++ b/.subagents/AGENTS-QUALITY.md @@ -271,6 +271,21 @@ Before opening a PR: - [ ] No real credentials or secrets in `testdata/` - [ ] CI workflow action refs are SHA-pinned with version comments +### Outbound tool resilience (retry, DLQ, rate limits) + +PRs touching `internal/tool`, `internal/config`, or `internal/cli/cmd_dlq.go`: + +- [ ] `go test ./internal/tool/... ./internal/cli/...` passes +- [ ] `go test -race ./internal/tool/... ./internal/cli/...` passes +- [ ] New tools with `retry:` config have tests covering transient retry and permanent no-retry paths +- [ ] `TestExecute_DLQEnqueueOnExhaustion` and `TestExecute_DLQEnqueueOnPermanent` pass +- [ ] `TestHostLimiter_*` suite passes; no real network calls +- [ ] `TestRunDLQ*` suite passes with `t.TempDir()` state dirs +- [ ] `leather dlq inspect` output includes ID, tool name, agent, attempt, error +- [ ] `leather dlq requeue --state-dir ... ` (item-id **last**) moves item +- [ ] `tools.rate_limits` in config.yaml parses without error; bad spec is warn+disable, not panic +- [ ] `/metrics` response contains `leather_tool_retry_total`, `leather_tool_backoff_total`, `leather_tool_rate_limit_wait_total`, `leather_outbound_dlq_depth` + --- _Last reviewed: 2026-06-05_ diff --git a/.subagents/AGENTS-RUNTIME.md b/.subagents/AGENTS-RUNTIME.md index b5392f1..992dcc2 100644 --- a/.subagents/AGENTS-RUNTIME.md +++ b/.subagents/AGENTS-RUNTIME.md @@ -116,13 +116,60 @@ func (r *Registry) GetTools(skillNames []string) []model.ToolDefinition func (r *Registry) ResolveTools(skillNames, toolsetNames, toolNames []string) []model.ToolDefinition // Executor dispatches HTTP- and MCP-backed tools. -type Executor struct { MCP *mcp.Registry } +// QueueMgr enables the outbound DLQ: permanent/exhausted failures are +// enqueued to "outbound-dlq" when this field is non-nil. +// Limiter enforces per-host token-bucket rate limits before each attempt. +type Executor struct { + MCP *mcp.Registry + QueueMgr *queue.Manager // nil = outbound DLQ disabled + AgentName string // injected by Runner; stored in DLQ items + Limiter *HostLimiter // nil = no rate limiting +} // Execute runs a single tool call. Always returns a ToolResult; never panics. // On failure, ToolResult.Error is set and Content may be empty. +// With def.Retry.MaxAttempts > 0, transient failures (5xx, 429, network) +// are retried with exponential backoff + jitter. func (e *Executor) Execute(ctx context.Context, def model.ToolDefinition, args map[string]any) model.ToolResult + +// MetricSnapshot returns a point-in-time snapshot of outbound tool counters. +// Counters are process-lifetime atomics; they reset only on restart. +func MetricSnapshot() (retryTotal, backoffTotal, rateLimitWaitTotal int64) ``` +#### Per-tool retry policy + +`ToolDefinition.Retry` (`model.ToolRetryConfig`) controls retry behaviour: + +| Field | Default | Meaning | +|---|---|---| +| `max_attempts` | 1 (no retry) | Total attempts including the initial one. | +| `base_delay` | `1s` | Initial backoff; doubles each attempt. | +| `max_delay` | `30s` | Backoff ceiling before jitter. | +| `honor_retry_after` | `true` | Use `Retry-After` header value when present. | + +A zero `ToolRetryConfig` (all fields at zero value) means **single attempt, no +retry** — preserving backward compatibility for tools that predate the policy. +Only tools with an explicit `retry:` block in their skill YAML get the retry loop. + +`isTransient` classifies the failure to decide whether to retry: +- **Transient** → retry: 429, 500, 502, 503, 504; network/timeout errors; + 403 with `X-RateLimit-Remaining: 0` +- **Permanent** → return immediately without retrying: all other 4xx + +#### Outbound DLQ + +When `Executor.QueueMgr != nil` and a tool call either: +- exhausts its retry budget on transient errors, or +- fails immediately with a permanent error, + +a `model.QueueItem` is enqueued to the well-known `"outbound-dlq"` queue. +DLQ items carry `ToolName`, `ToolTarget`, and the last error in `Payload`. +DLQ enqueue is a fire-and-forget side-effect; failure to enqueue is logged +at `warn` and does not affect the returned `ToolResult`. + +Items in `outbound-dlq` can be inspected and requeued via `leather dlq`. + #### Skill file format (`*.skill.yaml`) ```yaml @@ -139,6 +186,11 @@ tools: headers: Authorization: "Bearer {{env:GITHUB_TOKEN}}" Accept: application/vnd.github+json + retry: + max_attempts: 3 + base_delay: 1s + max_delay: 30s + honor_retry_after: true ``` #### Toolset file format (`*.toolset.yaml`) @@ -161,11 +213,15 @@ every toolset reference points at an already-known tool. - `mcp` → `execMCP` through a started `mcp.Registry` HTTP execution (`execHTTP`): +- Calls `e.Limiter.Wait(ctx, host)` before each attempt; blocks until the + per-host token bucket allows the request or ctx is cancelled. - Expands URL template with tool call arguments (`{{.field}}`). - Expands `{{env:VAR}}` in header values; **never logs auth header values**. - Sends the request with the runner's context (inherits timeout). - Response body is capped at 1 MB. - Non-2xx responses populate `ToolResult.Error` with the status code message. +- Transient failures are retried up to `def.Retry.MaxAttempts` times with + exponential backoff; permanent failures return immediately. MCP execution (`execMCP`): - Looks up the named server in the running registry. diff --git a/.subagents/AGENTS-SERVE.md b/.subagents/AGENTS-SERVE.md index 15caa33..9cef85a 100644 --- a/.subagents/AGENTS-SERVE.md +++ b/.subagents/AGENTS-SERVE.md @@ -46,6 +46,7 @@ Each subcommand has: | `run` | `RunOnce` | Load and execute a single agent, then exit | | `validate` | `RunValidate` | Parse and validate agent files; report errors | | `test-agent` | `RunTestAgent` | Execute an agent with `MockLLM` and print the turn transcript | +| `dlq` | `RunDLQ` | Inspect and requeue outbound dead-letter queue items | | `status` | `RunStatus` | Print scheduler state, job history, token usage | | `ingest` | `RunIngest` | Store raw bytes as a hide and optionally enqueue for curing | | `snapshot` | `RunSnapshot` → `RunSnapshotSave` / `RunSnapshotRestore` | Save or restore a `tar.gz` point-in-time archive of runtime state | @@ -212,8 +213,19 @@ cache_dir: "" # empty = serve uses /cache mcp_servers_file: "" # empty = run/serve/validate use ~/.leather/mcp-servers.yaml loop: 1 # repeat leather run N times tannery: "" # path to tannery.yaml; empty = tannery disabled + +tools: + rate_limits: # per-host token-bucket rate limits for outbound tool calls + api.github.com: "60/m" # format: "N/s", "N/m", or "N/h" + api.example.com: "10/s" ``` +`tools.rate_limits` is a nested map. Each key is a hostname (no port, no +scheme); the value is a rate spec in the form `N/` where unit is `s` +(seconds), `m` (minutes), or `h` (hours). The second call to the same host +within the interval blocks until the next token is available. Unknown hosts +pass through immediately with no limiting. + YAML keys are the snake_case equivalents of the flag names (strip `--`, replace `-` with `_`). @@ -255,6 +267,24 @@ All endpoints live in `internal/cli/api_tannery.go`. | `/curings` | GET | `handleCurings` | List curing definitions with queue depth | | `/intake` | POST | `handleIntake` | Direct-ingest endpoint; writes hide from request body | +### `leather dlq` subcommand (`internal/cli/cmd_dlq.go`) + +`RunDLQ(args, stdout, stderr)` dispatches `inspect` and `requeue`: + +``` +leather dlq inspect [--queue outbound-dlq] [--state-dir ...] +leather dlq requeue [--queue outbound-dlq] [--work-queue ] [--state-dir ...] +``` + +- **`inspect`** — lists all items in the DLQ; prints `ID | tool | agent | attempt | enqueued_at | error`. +- **`requeue`** — moves the named item from the DLQ to `--work-queue`, resetting + `AttemptCount` to 0 so it gets a fresh retry budget. Default `--work-queue` is + the DLQ name with the `-dlq` suffix stripped. + +**Important**: `` must come **after** all flags. Go's `flag.FlagSet` +stops parsing at the first non-flag token, so placing `` before flags +silently ignores the remaining flags. + ### `leather ingest` subcommand (`internal/cli/cmd_ingest.go`) `RunIngest(args, stdin, stdout, stderr)` — reads body from `--file` or stdin, @@ -378,6 +408,7 @@ internal/schema → internal/config | Flag name doesn't match env var | `--flag-name` → `LEATHER_FLAG_NAME`; check both | | Skipping graceful shutdown | Always call `scheduler.Drain` before returning from `RunServe` | | Flags after positional arg in `leather run` | Go's `flag.FlagSet` stops at the first non-flag token. The agent file path must come **last**: `leather run --config=... --var k=v agent.md` — not `leather run agent.md --config ...` | +| `` before flags in `leather dlq requeue` | Same issue: item-id must be **last** after all flags: `leather dlq requeue --state-dir ... ` | | `leather init` overwriting without `--overwrite` | `RunInit` fails closed: any pre-existing file causes a non-zero exit and reports `--overwrite` hint. Never silently clobber. | | Calling `RunValidate` from `RunInit` for post-write validation | `RunValidate` performs a full semantic check including model resolution (fails without `LEATHER_MODEL`). `RunInit` uses schema-only validation (`runInitValidate`) which is syntax-only and does not require a model to be set. | diff --git a/.subagents/AGENTS-TOOLS-SKILLS-TOOLSETS.md b/.subagents/AGENTS-TOOLS-SKILLS-TOOLSETS.md index 6490f2e..7578434 100644 --- a/.subagents/AGENTS-TOOLS-SKILLS-TOOLSETS.md +++ b/.subagents/AGENTS-TOOLS-SKILLS-TOOLSETS.md @@ -218,6 +218,29 @@ binary for shell commands; an HTTP MCP server for HTTP APIs). - The capability is a coherent verb (e.g. `release-write`, `inbox-triage`), not a grab bag. +#### Adding retry policy to a skill tool + +Tools that call remote APIs should declare a `retry:` block to handle transient +failures (5xx, 429, network errors) without relying on the caller to retry: + +```yaml +tools: + - name: github_list_issues + type: http + http: + url: "https://api.github.com/repos/{{.repo}}/issues" + retry: + max_attempts: 3 # initial attempt + 2 retries + base_delay: 1s # doubles each retry; capped at max_delay + max_delay: 30s + honor_retry_after: true # use Retry-After header when present +``` + +Omitting `retry:` (or setting `max_attempts: 1`) preserves the legacy +single-attempt behaviour — no retries, no backoff. Only transient errors +(5xx, 429, network timeouts) trigger retries; permanent 4xx errors return +immediately regardless of the retry config. + ### When to write a per-turn declaration - The turn does something dangerous and the agent's base scope is too diff --git a/.subagents/AGENTS-WORKER.md b/.subagents/AGENTS-WORKER.md index 7922103..7c87e4a 100644 --- a/.subagents/AGENTS-WORKER.md +++ b/.subagents/AGENTS-WORKER.md @@ -132,6 +132,35 @@ re-serialize the full slice. Files are mode 0600; directories mode 0700. - `Payload` — `map[string]any`; template variables available in agent prompts - `EnqueuedAt` — Unix timestamp - `AttemptCount` — incremented by the runner on each dequeue-and-run attempt +- `ToolName` — non-empty for outbound DLQ items; the failed tool's name +- `ToolTarget` — non-empty for outbound DLQ items; the URL or `server/tool` string + +#### Outbound DLQ (`outbound-dlq`) + +Tool execution failures that are permanent or that exhaust their retry budget +are enqueued to the well-known `outbound-dlq` queue by `tool.Executor`. These +items are **not** processed by `internal/curing` workers (`CuringName == ""`). +They are surfaced for operator inspection and manual requeue via `leather dlq`. + +Outbound DLQ item shape: + +| Field | Meaning | +|---|---| +| `ID` | `odlq__