M18.1: anvil_ssh::retry module — RetryPolicy + classifier + run loop (FR-81, FR-82, FR-83)#28
Merged
Merged
Conversation
…(FR-81, FR-82, FR-83 lib side)
Adds the library-side scaffold M18 needs to honour ConnectTimeout +
ConnectionAttempts from `~/.ssh/config` (FR-80, M18.2), classify
transient vs fatal errors (FR-82), drive an exponential-backoff
retry loop with jitter (FR-81), and capture per-attempt history for
FR-83's `gitway --test --json` envelope.
src/retry.rs (new, ~430 lines + ~190 lines of tests):
- pub struct RetryPolicy { attempts, base, factor, cap, max_window,
connect_timeout } with builder-style setters. Default values
per PRD: 3 attempts, 250 ms base, x2 factor, 8 s cap, 30 s max
window, no connect_timeout.
- pub enum Disposition { Retry, Fatal }
pub fn classify(err: &AnvilError) -> Disposition (FR-82):
* AuthenticationFailed / HostKeyMismatch / NoKeyFound /
KeyEncrypted -> Fatal
* Io kind in {ConnectionRefused, TimedOut, HostUnreachable,
NetworkUnreachable, NotFound (DNS NXDOMAIN), AddrNotAvailable}
-> Retry
* Everything else (russh protocol, signing, signature-invalid,
other Io kinds) -> Fatal
HTTP 429/503 detection from FR-82's defensive wording is out of
scope: Anvil speaks raw SSH; HTTP statuses only surface in
ProxyCommand subprocess output that Anvil doesn't parse.
- pub struct RetryAttempt { attempt: u32, reason: String,
elapsed: Duration } captures per-failure history for FR-83.
- pub async fn run<F, Fut, T>(policy, op) -> Result<(T, Vec<RetryAttempt>), AnvilError>
drives the loop with jittered exponential backoff:
delay_n = min(base * factor^(n-1), cap) + uniform_jitter([0, base/2])
Jitter sourced from rand_core::OsRng (already in deps from M19's
prepend_revoked).
Bails on max_window before starting another attempt.
Emits tracing::warn! at CAT_RETRY per failed attempt with
attempt / reason / elapsed_ms / disposition fields.
- run is timeout-agnostic: the per-attempt tokio::time::timeout
wrap lives at the call site (M18.2's session.rs::connect) so
the same loop driver can be reused for non-network operations.
- 15 unit tests covering: default-policy values, builder
chainability, classifier matrix (auth-fatal / host-key-fatal /
no-key-fatal / io-connection-refused-retry / io-timed-out-retry /
io-not-found-retry / io-permission-denied-fatal), run loop
(success-first-try / bail-on-fatal / retry-and-record-history /
exhaust-attempt-count), backoff curve (exponential growth, cap
enforcement, 1000-draw jitter window).
src/log.rs:
- New pub const CAT_RETRY = "anvil_ssh::retry"; appended to
CATEGORIES.
- categories_slice_matches_individual_constants test updated to
include the new constant.
src/error.rs:
- New pub fn AnvilError::io_kind(&self) -> Option<std::io::ErrorKind>
returns the underlying io::Error::kind() for the Io variant.
Used by retry::classify; also useful for downstream consumers
inspecting failure categories.
- New pub fn AnvilError::is_transient(&self) -> bool returning
matches!(retry::classify(self), Disposition::Retry). Surfaces
the classifier as a single-call predicate for log-aggregation
pipelines and CLI error paths.
src/lib.rs:
- pub mod retry;
Public API: pure additive. Version bump to 0.9.0 lands in M18.3.
Plan: M18.1 of anvil-gitway-milestone-plan.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 tasks
UnbreakableMJ
added a commit
that referenced
this pull request
May 4, 2026
…t) (#30) Final Anvil-side slice of M18. Bumps anvil-ssh from 0.8.0 to 0.9.0 to publish the M18.1 + M18.2 work as a single crates.io release. The Gitway-side CLI flags + retry_attempts JSON envelope (M18.4 + M18.5) land against this 0.9.0; the M18.X PRD doc PR closes the milestone with Gitway v1.0.0-rc.9. Cargo.toml: - version "0.8.0" -> "0.9.0" Cargo.lock: - regenerated locally; reflects the 0.9.0 version. CHANGELOG.md: - 0.9.0 entry covering the new anvil_ssh::retry module (RetryPolicy, classify, run, RetryAttempt), the new CAT_RETRY tracing category, the AnvilError::io_kind + is_transient predicates, the three new AnvilConfig fields + builder setters, the apply_ssh_config consumption of ConnectTimeout / ConnectionAttempts, the AnvilSession::connect retry+timeout wrap, and the AnvilSession::retry_history accessor. Documents the proxy/jump scope-narrowing and the HTTP 429/503 out-of-scope decision. Stacked after PRs #28 (M18.1, merged) and #29 (M18.2, merged). Plan: M18.3 of anvil-gitway-milestone-plan.md. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First slice of M18 (PRD §5.8.7 — connection retry / backoff / timeouts). Adds the library-side scaffold for FR-81 / FR-82 / FR-83 — M18.2 plumbs this into
AnvilConfig+session.rs::connect; M18.4 wires the Gitway CLI flags +retry_attemptsJSON envelope.New
anvil_ssh::retrymoduleRetryPolicy { attempts, base, factor, cap, max_window, connect_timeout }with builder setters. Defaults per PRD: 3 attempts, 250 ms base, ×2 factor, 8 s cap, 30 s max_window, no connect_timeout.Disposition { Retry, Fatal }+classify(err)(FR-82) — auth/host-key/no-key/key-encrypted → Fatal; transient I/O kinds (ConnectionRefused, TimedOut, HostUnreachable, NetworkUnreachable, NotFound (DNS), AddrNotAvailable) → Retry; everything else (russh protocol, other I/O kinds) → Fatal.RetryAttempt { attempt, reason, elapsed }— per-failure history for FR-83.async fn run(policy, op) -> (T, Vec<RetryAttempt>)— drives the loop with jittered exponential backoff (min(base * factor^(n-1), cap) + uniform_jitter([0, base/2])), jitter sourced fromOsRng. Bails onmax_windowcap before starting another attempt. Emitstracing::warn!at the newCAT_RETRYcategory per failed attempt.runis timeout-agnostic — the per-attempttokio::time::timeoutwrap lives at the M18.2 call site so the same loop is reusable for non-network operations.Supporting additions
log::CAT_RETRY = "anvil_ssh::retry"appended toCATEGORIES.AnvilError::io_kind()— returnsOption<std::io::ErrorKind>for the Io variant.AnvilError::is_transient()— single-call predicate wrappingretry::classify.Tests (15 unit tests)
Default policy shape, builder chainability, classifier matrix (7 cases), run loop (success-first / bail-on-fatal / retry-record-history / exhaust-count), backoff curve (exponential growth + cap enforcement + 1000-draw jitter window).
Public API: pure additive. Version bump to 0.9.0 lands in M18.3. HTTP 429/503 detection from FR-82 is documented as out of scope — Anvil speaks raw SSH.
Test plan
cargo fmt --all -- --checkcargo clippy --all-targets --all-features --locked -- -D warningscargo test --lib --tests --locked— all green (15 newretry::tests::*, plus the existing M11–M17 set)🤖 Generated with Claude Code