From 147336c83379c333dbc71629431ffa7cadbf9a3e Mon Sep 17 00:00:00 2001 From: Moises E Jaramillo Date: Wed, 17 Jun 2026 16:12:52 -0400 Subject: [PATCH 1/2] fix(security): close HTTP receive timing side-channel (#35) The #20 fix unified every HTTP receive failure to a bodyless 400, but response TIME still partitioned them: a held recipient kid runs full ECDH/AEAD (~360 us) while an unheld kid fast-fails before any crypto (~180 us), so one timed request enumerated which recipient keys the agent holds (the residual L-025 filed). Add a constant-time floor on the 400-rejection path: - New DidCommReceiveOptions.ReceiveRejectionFloor (default 5 ms; TimeSpan.Zero disables). Both MapDidCommEndpoint HTTP overloads pad the 400 with a non-cancelable, timer-backed Task.Delay up to the floor, so unheld-fast-fail and held-AEAD-fail pad to the same time. - 202 success and 415/413 are never padded (cost is paid only on rejected traffic). WebSocket is document-only (nothing written back). Adversarial pass (repo rule) found two issues: - Finding 1 (fixed): the floor clock was started at handler entry, before the <=1 MiB body read; a peer could pad the envelope to evict the floor. Moved the timestamp to after the (kid-independent) body read; added Large_body_rejection_is_still_padded_to_the_floor. - Finding 2 (documented residual): the held-only decrypt-then-network- DID-resolution path can exceed any fixed floor; unbounded network latency means a fixed floor cannot close it. Documented in the API docs + CHANGELOG; mitigation is auth / a rate-limiter. Root cause is upstream (JweParser.Parse/FindPresent fast-fails before ECDH) -> filed dataproofs-dotnet#12 + tracking #42. Tests: +6 (3 unit pad-helper, rejection-floored, large-body-floored, success-not-padded). Full suite green (656 tests). Builds clean under warnings-as-errors. Co-Authored-By: Claude Opus 4.8 (1M context) --- CHANGELOG.md | 36 ++++++ .../DidCommEndpointRouteBuilderExtensions.cs | 71 +++++++++++- .../DidCommReceiveOptions.cs | 32 ++++++ tasks/lessons.md | 27 +++++ tasks/todo20260617T200730Z.md | 33 ++++++ .../AspNetCoreReceiveRoundTripTests.cs | 108 +++++++++++++++++- 6 files changed, 303 insertions(+), 4 deletions(-) create mode 100644 tasks/todo20260617T200730Z.md diff --git a/CHANGELOG.md b/CHANGELOG.md index ced0b7b..b4f0d61 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,6 +6,42 @@ All notable changes to didcomm-dotnet are documented here. Format follows ## [Unreleased] +### Security — HTTP receive timing side-channel (`feat/security-receive-timing-floor`) + +Closes GitHub issue **#35** (High) — the timing residual the #20 red-team explicitly filed (see +`tasks/lessons.md` L-025). The #20 fix made every HTTP receive failure return a uniform bodyless +`400`, closing the response-content/status oracle, but response **time** still partitioned those 400s: +an envelope addressed to a recipient key the agent holds runs the full ECDH/AEAD path (~360 µs) while +one addressed to a key it does not hold fast-fails before any crypto (~180 µs), so a single timed +request enumerated which recipient keys the agent holds (FR-API-07). A PoC-backed adversarial pass +caught — and the fix folds in — two further residuals. + +- **Constant-time rejection floor.** New `DidCommReceiveOptions.ReceiveRejectionFloor` (default + **5 ms**, set `TimeSpan.Zero` to disable). The HTTP receive handler times the unpack window and, + before returning the uniform `400`, pads with a non-cancelable timer-backed delay up to the floor, so + the rejection time is independent of how far envelope processing got. The fast-fail "unheld kid" path + and the full-crypto "held kid + AEAD fail" path now both finish under the floor and pad to the same + duration. **Behavior change:** rejected requests now take ≥ the floor; legitimate received messages + answer `202` and are never padded (the hot path is untouched), and `415`/`413` pre-decrypt rejections + are unchanged. The cost is paid only on rejected — typically hostile or malformed — traffic. +- **Red-team Finding 1 (folded in): floor measured after the body read.** The pad clock starts *after* + `ReadCappedAsync`, not at handler entry — otherwise a peer could pad the envelope toward + `MaxReceiveBytes` so the (kid-independent) body read alone exhausts the floor, collapse the pad to + zero, and re-expose the gap. Covered by `Large_body_rejection_is_still_padded_to_the_floor`. +- **Red-team Finding 2 (documented residual, not closed by a fixed floor).** The held-only path that + decrypts and then does **network** DID resolution (authcrypt sender / FR-CONSIST-06 / `from_prior`) + can run longer than the floor — e.g. an attacker authcrypts to a guessed kid signed by an + attacker-controlled, deliberately-slow `did:webvh` sender, making a held response visibly exceed the + floor while an unheld one stays at it. That residual is network-bound and unbounded, so a fixed floor + cannot close it without an absurd value; the API docs direct operators to authentication / a + rate-limiter in front of the receive endpoint (the deployment tradeoff #35 calls out). +- **WebSocket receive is document-only.** The WS loops log-and-discard with nothing written back, so + there is no per-message response to time; the floor is intentionally not applied there (noted in code). +- **Root cause tracked upstream.** The local-crypto gap originates in `JweParser.Parse` / + `FindPresent` (DataProofsDotnet.Jose), which fast-fails before ECDH when no recipient kid is held. + The floor is a compensating edge control; the durable constant-time-recipient-selection fix is filed + upstream (`moisesja/dataproofs-dotnet#12`) and tracked here in #42. + ### Security — protocol & transport hardening (`feat/security-protocol-hardening`) Resolves four scattered Low/Info findings (GitHub issues #27, #30, #31, #34); #32 closed as diff --git a/src/DidComm.AspNetCore/DidCommEndpointRouteBuilderExtensions.cs b/src/DidComm.AspNetCore/DidCommEndpointRouteBuilderExtensions.cs index daf9fea..dd19c35 100644 --- a/src/DidComm.AspNetCore/DidCommEndpointRouteBuilderExtensions.cs +++ b/src/DidComm.AspNetCore/DidCommEndpointRouteBuilderExtensions.cs @@ -1,4 +1,5 @@ using System.Buffers; +using System.Diagnostics; using System.Net.Mime; using System.Net.WebSockets; using DidComm.Exceptions; @@ -83,6 +84,13 @@ public static IEndpointConventionBuilder MapDidCommEndpoint( var logger = httpContext.RequestServices.GetService()?.CreateLogger("DidComm.AspNetCore.Receive"); + // #35: start the constant-time clock HERE — after the body has been read — so the floor + // budgets only the unpack window where the recipient-kid-dependent timing lives. The body + // read is kid-independent (the peer chose its size), so charging it against the floor would + // let a peer pad the request out to MaxReceiveBytes, exhaust the floor before any crypto, + // and re-expose the held-vs-unheld gap (PR #43 red-team). + var startTimestamp = Stopwatch.GetTimestamp(); + UnpackResult unpacked; try { @@ -99,7 +107,7 @@ public static IEndpointConventionBuilder MapDidCommEndpoint( // downstream DID-resolution timeout (TaskCanceledException : OperationCanceledException // with the request token NOT cancelled, e.g. an attacker-controlled did:webvh host that // hangs). Only a real client abort is rethrown above, so there is no 400-vs-500 oracle. - return NormalizedReceiveRejection(logger, ex); + return await NormalizedReceiveRejectionAsync(logger, ex, startTimestamp, receiveOptions.ReceiveRejectionFloor).ConfigureAwait(false); } // The message was received and unpacked; HTTP receive is one-way / fire-and-forget @@ -170,6 +178,11 @@ public static IEndpointConventionBuilder MapDidCommEndpoint( return Results.StatusCode(StatusCodes.Status413PayloadTooLarge); } + // #35: start the constant-time clock after the (kid-independent) body read, covering only + // the unpack+dispatch window where the recipient-kid-dependent timing lives — see the + // matching note in the callback overload above. + var startTimestamp = Stopwatch.GetTimestamp(); + // Unpack AND dispatch share one opaque exit: a handler bug (FR-THR-04 InvalidOperationException), // an unpack rejection (crypto/malformed/consistency/resolution), and a downstream // DID-resolution timeout (TaskCanceledException with the request token NOT cancelled) all @@ -188,7 +201,7 @@ public static IEndpointConventionBuilder MapDidCommEndpoint( } catch (Exception ex) { - return NormalizedReceiveRejection(logger, ex); + return await NormalizedReceiveRejectionAsync(logger, ex, startTimestamp, receiveOptions.ReceiveRejectionFloor).ConfigureAwait(false); } }); } @@ -344,6 +357,11 @@ private static async Task ReceiveLoopWithDispatchAsync( var packed = System.Text.Encoding.UTF8.GetString(ms.ToArray()); + // #35: the recipient-kid timing side-channel that the HTTP receive path pads against + // (see DelayToRejectionFloorAsync) does NOT apply here — a failed unpack is logged and + // discarded with NOTHING written back on the socket, so there is no per-message response + // for a peer to time. We deliberately do not pad the discard: it would add latency to the + // whole multiplexed message stream on the connection for a far weaker, noisier signal. UnpackResult unpacked; try { @@ -512,8 +530,18 @@ private static void LogOutcome(ILogger? logger, DispatchOutcome outcome, bool sa /// status code or response text as a decryption / recipient-kid / DID-resolution oracle (issues /// #20/#28/#33). The detailed reason is logged server-side only. The WebSocket paths already take /// this posture (log-and-discard, echo nothing); this brings the HTTP paths to parity. + /// + /// #35: the uniform body/status alone is not enough — the response time still partitioned + /// the failures (a held recipient kid runs the full ECDH/AEAD path; an unheld one fast-fails before + /// any crypto), so a single timed request enumerated which keys the agent holds. Before returning + /// the 400 we therefore pad the response out to measured from + /// (captured after the body read, so the floor budgets only the + /// unpack window), making the rejection time independent of the crypto work done. See + /// . + /// /// - private static IResult NormalizedReceiveRejection(ILogger? logger, Exception ex) + private static async Task NormalizedReceiveRejectionAsync( + ILogger? logger, Exception ex, long startTimestamp, TimeSpan rejectionFloor) { if (ex is DidCommException) { @@ -527,9 +555,43 @@ private static IResult NormalizedReceiveRejection(ILogger? logger, Exception ex) logger?.LogError(ex, "DidComm receive: inbound processing failed ({Reason}).", ex.GetType().Name); } + await DelayToRejectionFloorAsync(startTimestamp, rejectionFloor).ConfigureAwait(false); return Results.StatusCode(StatusCodes.Status400BadRequest); } + /// + /// #35: wait until at least has elapsed since + /// (captured after the body read, so the floor budgets only the unpack window) before the caller + /// returns its rejection. This pins the cheap, universal enumeration probe — garbage ciphertext to a + /// guessed kid — to a constant time: the fast-fail "unheld kid" path and the full-crypto "held kid + + /// AEAD fail" path both finish well under the floor and so are padded to the same duration. + /// + /// The wait is deliberately not cancelable: a cancelable delay would let a peer abort the + /// connection early to skip the pad and re-expose the timing gap. + /// is timer-backed and holds no thread, so padding a soon-to-be-rejected request is cheap. + /// + /// + /// LIMITATION (documented, not fixed by a fixed floor): a fixed floor only masks failures that + /// finish under it. The held-only path that decrypts and then does network DID resolution + /// (authcrypt sender / FR-CONSIST-06 / from_prior) can run longer than the floor — and an + /// attacker who can drive that (e.g. authcrypt to a guessed kid signed by an attacker-controlled, + /// deliberately-slow did:webvh sender) sees "response ≫ floor" for a held kid versus "response ≈ + /// floor" for an unheld one. That residual is unbounded (network latency), so a fixed floor cannot + /// close it without an absurd value; mitigate it with auth / a rate-limiter in front of the receive + /// endpoint (the deployment tradeoff the #35 issue calls out). The durable fix for the local-crypto + /// gap is constant-time recipient-kid selection in the JWE parser, tracked upstream. + /// + /// + internal static async Task DelayToRejectionFloorAsync(long startTimestamp, TimeSpan floor) + { + if (floor <= TimeSpan.Zero) + return; + + var remaining = floor - Stopwatch.GetElapsedTime(startTimestamp); + if (remaining > TimeSpan.Zero) + await Task.Delay(remaining).ConfigureAwait(false); + } + private static async Task ReceiveLoopAsync( System.Net.WebSockets.WebSocket socket, DidCommClient client, @@ -570,6 +632,9 @@ private static async Task ReceiveLoopAsync( // FR-TRN-09: the reassembled bytes are exactly one packed envelope. var packed = System.Text.Encoding.UTF8.GetString(ms.ToArray()); + // #35: no constant-time pad here (unlike the HTTP 400 path) — a failed unpack is logged + // and discarded with nothing written back, so there is no per-message response a peer can + // time to enumerate held recipient kids. try { var unpacked = await client.UnpackAsync(packed, ct).ConfigureAwait(false); diff --git a/src/DidComm.AspNetCore/DidCommReceiveOptions.cs b/src/DidComm.AspNetCore/DidCommReceiveOptions.cs index a19a2e3..9589835 100644 --- a/src/DidComm.AspNetCore/DidCommReceiveOptions.cs +++ b/src/DidComm.AspNetCore/DidCommReceiveOptions.cs @@ -27,4 +27,36 @@ public sealed class DidCommReceiveOptions /// (e.g. chat samples) set this to true per endpoint. /// public bool AllowSameSocketReplies { get; set; } + + /// + /// Minimum wall-clock time the HTTP receive endpoint takes before returning the uniform + /// 400 rejection. The handler times itself from entry and, before answering 400, waits + /// out the remainder of this floor so the response time no longer reveals how far envelope + /// processing got. This closes the recipient-kid timing side-channel (#35): without it, an + /// envelope addressed to a recipient key the server holds runs the full ECDH/AEAD path (~360 µs) + /// while one addressed to a key it does NOT hold fast-fails (~180 µs) — a single timed request + /// then tells an attacker which recipient keys the agent holds, the same enumeration the uniform + /// 400 body (#20) set out to remove. Padding every rejection up to a fixed floor removes that + /// gap (FR-API-07). + /// + /// Applies to the 400 path only. A successfully received message answers 202 and is never + /// padded (it stays full-throughput); the 415 / 413 pre-decrypt rejections carry a different + /// status code and reveal nothing about which keys are held, so they are not padded either. The + /// cost is therefore paid only on rejected — typically hostile or malformed — traffic. The floor + /// is measured from just after the request body is read, so a peer cannot exhaust it by padding + /// the body out to the size limit. + /// + /// Keep this small (single-digit milliseconds): 5 ms already dominates the local-crypto spread + /// with margin, and the wait is backed by a timer (it holds no thread). Defaults to 5 ms; set + /// to disable the floor entirely. + /// + /// What this does and does not close. The floor closes the cheap, universal probe — + /// an unauthenticated peer sending garbage ciphertext to guessed recipient kids and timing the + /// 400 to learn which keys the agent holds. It does NOT, on its own, close the held-only path + /// where a decryptable envelope triggers network DID resolution that runs longer than the floor + /// (an attacker-controlled slow sender DID makes the held response visibly exceed the floor while + /// an unheld one stays at it). That residual is network-bound and unbounded; close it with + /// authentication / a rate-limiter in front of the endpoint, not by inflating this value. + /// + public TimeSpan ReceiveRejectionFloor { get; set; } = TimeSpan.FromMilliseconds(5); } diff --git a/tasks/lessons.md b/tasks/lessons.md index bb5e3e1..7dc28d7 100644 --- a/tasks/lessons.md +++ b/tasks/lessons.md @@ -548,3 +548,30 @@ Format per entry: validated URL you surface to a consumer, return the canonical `AbsoluteUri` and reject userinfo. When a comment claims a fix "mirrors" an existing guard, verify parity — the OOB redirect guard is the scheme half of TraceObserver's, NOT its allowlist; say so rather than overclaim. + +## L-030 — A constant-time floor must cover ONLY the secret-dependent window, and a fixed floor cannot mask an unbounded tail. + +- **Lesson:** Closing the #25/L-025 timing residual (#35) with a response-time floor took two red-team + iterations to get right. (a) **Floor the right window.** The first cut captured the start timestamp at + HTTP handler entry — *before* the up-to-`MaxReceiveBytes` (1 MiB) body read. Body-read time is + attacker-controlled and kid-independent, but it was charged against the floor, so a peer padding the + envelope toward the size cap made the read alone exhaust the floor (`remaining ≤ 0`), the pad collapsed + to zero, and the held-vs-unheld crypto gap was re-exposed. Fix: start the clock *after* the body read, + budgeting only the unpack window where the secret-dependent timing actually lives. (b) **A fixed floor + only masks paths that finish UNDER it.** The held-only decrypt-then-network-DID-resolution path + (authcrypt sender / FR-CONSIST-06 / `from_prior`) can run *longer* than any sane floor — an + attacker-controlled slow `did:webvh` sender turns the floor into a clean threshold: "response ≫ floor ⇒ + held". My first draft comment rationalized this away as "unreachable for an unheld kid, so masked" — + which is exactly backwards: unreachable-for-the-other-class IS the discriminator. Honest disposition: + document it as an unbounded residual and point operators at auth / a rate-limiter (the deployment + tradeoff the issue itself named), don't claim closure. +- **Why:** A timing defense is only as good as (1) the window it measures — anything constant-time-padded + must exclude attacker-controlled, secret-independent preamble, or the attacker inflates the preamble to + evict the pad; and (2) the assumption that the padded work finishes under the floor — a network/IO tail + on the secret-holding branch breaks that and a fixed floor can't fix unbounded latency. +- **How to apply:** When you add a constant-time floor: measure from immediately before the + secret-dependent work, not from request entry; add a regression test with a *large* input on the padded + path (not just a tiny one) to prove the preamble can't evict the floor; and when a branch has an + unbounded (network) tail reachable only when the secret is present, say plainly that the floor does not + close it and name the compensating control — never argue "the other class can't reach it" as if that + hid the signal. Always re-run the break-it adversarial pass on the *revised* fix, not just the first. diff --git a/tasks/todo20260617T200730Z.md b/tasks/todo20260617T200730Z.md new file mode 100644 index 0000000..e89f8c1 --- /dev/null +++ b/tasks/todo20260617T200730Z.md @@ -0,0 +1,33 @@ +# Issue #35 — HTTP receive timing side-channel (recipient-kid enumeration) + +Branch: `feat/security-receive-timing-floor`. Full plan: `~/.claude/plans/vivid-kindling-floyd.md`. + +## Problem +After #20 unified every HTTP receive failure to a bodyless `400`, response **time** still partitioned +the 400s — a held recipient kid runs full ECDH/AEAD (~360 µs) vs an unheld kid fast-failing before any +crypto (~180 µs) — so one timed request enumerated which recipient keys the agent holds. (L-025 filed it.) + +## Tasks +- [x] Add `DidCommReceiveOptions.ReceiveRejectionFloor` (default 5 ms; `TimeSpan.Zero` disables). +- [x] Pad the HTTP `400` rejection path to the floor via non-cancelable `DelayToRejectionFloorAsync`, + measured from a timestamp captured **after** the body read (both `MapDidCommEndpoint` overloads). +- [x] WebSocket: document-only (no per-message response to time; no pad). +- [x] Tests: 3 unit (pad helper), 3 integration (rejection floored, large-body still floored, success + not floored) + extended the existing no-oracle test's doc. +- [x] Build clean under warnings-as-errors; full suite green (656 tests). +- [x] Two adversarial passes (found+fixed Finding 1; documented Finding 2). +- [x] CHANGELOG + lessons (L-030). +- [ ] Tracking issue (this repo) + detailed DataProofs GitHub issue. + +## Review +- **Headline oracle closed.** The cheap, universal probe (garbage ciphertext to a guessed kid) now pads + to a constant floor: unheld fast-fail and held AEAD-fail both finish under 5 ms and are indistinguishable. +- **Finding 1 (HIGH, fixed before merge):** floor clock moved to *after* the body read so a peer can't + pad the envelope toward `MaxReceiveBytes` to evict the pad. Regression test added. +- **Finding 2 (HIGH, documented residual):** the held-only decrypt-then-**network-DID-resolution** path + can exceed any sane fixed floor → "response ≫ floor ⇒ held". Unbounded (network); a fixed floor can't + close it. Documented in the API XML docs + CHANGELOG; mitigation is auth / a rate-limiter in front of + the endpoint — the deployment tradeoff #35 itself names. Root cause (`JweParser.FindPresent` fast-fail) + is upstream in DataProofsDotnet.Jose → filed as a separate issue. +- **Cost:** paid only on rejected (hostile/malformed) traffic; `202` success is never padded; `415`/`413` + unchanged. `Task.Delay` is timer-backed (holds no thread). diff --git a/tests/DidComm.InteropTests/Transports/AspNetCoreReceiveRoundTripTests.cs b/tests/DidComm.InteropTests/Transports/AspNetCoreReceiveRoundTripTests.cs index 320c558..3f9ebee 100644 --- a/tests/DidComm.InteropTests/Transports/AspNetCoreReceiveRoundTripTests.cs +++ b/tests/DidComm.InteropTests/Transports/AspNetCoreReceiveRoundTripTests.cs @@ -1,3 +1,4 @@ +using System.Diagnostics; using System.Net; using System.Net.Http.Headers; using System.Text; @@ -125,6 +126,8 @@ public async Task Receive_rejections_are_indistinguishable_so_there_is_no_oracle // (ConsistencyException) MUST all yield the SAME response — same 400 status, same empty body. // Previously crypto failures leaked the failing recipient kid / layer, and a ConsistencyException // escaped to a 500 (+ dev stack trace), so a peer could distinguish the failure class. + // (#35 closes the residual TIMING channel these same classes left open — covered by the + // floor tests below; this test asserts the status/body parity only.) var (server, _) = await BuildServerAsync(); var malformed = "not even json"; @@ -145,6 +148,104 @@ await PostEnvelopeAsync(server, forged), }); } + // === #35: constant-time rejection floor ================================================= + // The pad helper is exercised directly (no HTTP, no flaky upper bounds) and through the live + // endpoint. Every assertion against a rejection is a LOWER bound on elapsed time: a slow CI can + // only make elapsed larger, never smaller, so these do not flake. The one upper-bound assertion + // (the success path) is given a multi-second floor it must NOT pay, so its margin is enormous. + + [Fact] + public async Task DelayToRejectionFloor_waits_out_the_remaining_floor() + { + var start = Stopwatch.GetTimestamp(); + var floor = TimeSpan.FromMilliseconds(200); + + await DidCommEndpointRouteBuilderExtensions.DelayToRejectionFloorAsync(start, floor); + + var elapsed = Stopwatch.GetElapsedTime(start); + // Lower bound only (180 ms < 200 ms floor, slack for timer granularity). Task.Delay never + // fires early, so a held-kid path that finished in microseconds is still lifted to the floor. + elapsed.Should().BeGreaterThanOrEqualTo(TimeSpan.FromMilliseconds(180)); + } + + [Fact] + public async Task DelayToRejectionFloor_is_a_noop_when_floor_is_zero() + { + var start = Stopwatch.GetTimestamp(); + + await DidCommEndpointRouteBuilderExtensions.DelayToRejectionFloorAsync(start, TimeSpan.Zero); + + // Disabled floor: returns essentially immediately (no wait). Generous ceiling — this only + // guards against an accidental unconditional delay, not a precise timing. + Stopwatch.GetElapsedTime(start).Should().BeLessThan(TimeSpan.FromMilliseconds(100)); + } + + [Fact] + public async Task DelayToRejectionFloor_is_a_noop_when_work_already_exceeded_the_floor() + { + var start = Stopwatch.GetTimestamp(); + + // A floor already elapsed by the time we call (1 tick = 100 ns) → no further delay. Models the + // held + DID-resolution path that legitimately ran longer than the floor. + await DidCommEndpointRouteBuilderExtensions.DelayToRejectionFloorAsync(start, TimeSpan.FromTicks(1)); + + Stopwatch.GetElapsedTime(start).Should().BeLessThan(TimeSpan.FromMilliseconds(100)); + } + + [Fact] + public async Task Rejection_response_is_padded_to_the_floor() + { + // With a 200 ms floor, the malformed-envelope rejection (which fast-fails in microseconds) + // must still take ≥ ~floor before answering 400 — the timing channel is closed end-to-end. + var (server, _) = await BuildServerAsync(rejectionFloor: TimeSpan.FromMilliseconds(200)); + + var sw = Stopwatch.StartNew(); + var result = await PostEnvelopeAsync(server, "not even json"); + sw.Stop(); + + result.Status.Should().Be(HttpStatusCode.BadRequest); + result.Body.Should().BeEmpty(); + sw.Elapsed.Should().BeGreaterThanOrEqualTo(TimeSpan.FromMilliseconds(180)); + } + + [Fact] + public async Task Large_body_rejection_is_still_padded_to_the_floor() + { + // PR #43 red-team (Finding 1): the floor must be measured AFTER the body read, not from + // handler entry. Otherwise a peer pads the envelope out toward MaxReceiveBytes so the read + // time alone exhausts the floor, the pad collapses to zero, and the held-vs-unheld crypto gap + // is re-exposed. A large malformed body must therefore STILL be padded to the floor. + var (server, _) = await BuildServerAsync( + configureOptions: o => o.MaxReceiveBytes = 1 * 1024 * 1024, + rejectionFloor: TimeSpan.FromMilliseconds(200)); + + var largeMalformed = new string('x', 256 * 1024); // 256 KiB, well under the 1 MiB cap; not valid JSON + var sw = Stopwatch.StartNew(); + var result = await PostEnvelopeAsync(server, largeMalformed); + sw.Stop(); + + result.Status.Should().Be(HttpStatusCode.BadRequest); + result.Body.Should().BeEmpty(); + sw.Elapsed.Should().BeGreaterThanOrEqualTo(TimeSpan.FromMilliseconds(180)); + } + + [Fact] + public async Task Successful_receive_is_not_padded_by_the_floor() + { + // Pin a deliberately huge floor (2 s) that ONLY the rejection path would pay. A successfully + // unpacked envelope answers 202 and must return far faster — proving the floor never touches + // the hot success path (which would otherwise be a throughput regression). + var (server, _) = await BuildServerAsync(rejectionFloor: TimeSpan.FromSeconds(2)); + + var valid = await PackAnoncryptForBobAsync(); + var sw = Stopwatch.StartNew(); + var result = await PostEnvelopeAsync(server, valid); + sw.Stop(); + + result.Status.Should().Be(HttpStatusCode.Accepted); // 202, decrypted for Bob + sw.Elapsed.Should().BeLessThan(TimeSpan.FromSeconds(1)); // nowhere near the 2 s rejection floor + } + private static DidCommClient NewPackerClient() { var actors = SpecActorRegistry.LoadDefault(); @@ -245,7 +346,8 @@ public Task ResolveAsync(string did, DidResolutionOptions? private static async Task<(TestServer Server, List Received)> BuildServerAsync( Action? configureOptions = null, Func? onReceive = null, - IDidResolver? resolverOverride = null) + IDidResolver? resolverOverride = null, + TimeSpan? rejectionFloor = null) { var received = new List(); var actors = SpecActorRegistry.LoadDefault(); @@ -270,6 +372,10 @@ public Task ResolveAsync(string did, DidResolutionOptions? o.MaxReceiveBytes = 64 * 1024; configureOptions?.Invoke(o); }); + // #35: let a test pin the constant-time rejection floor. When unset, the endpoint falls back + // to the DidCommReceiveOptions default (5 ms), exercising the production posture. + if (rejectionFloor is { } floor) + builder.Services.Configure(o => o.ReceiveRejectionFloor = floor); var app = builder.Build(); app.UseRouting(); From ada3af3969f33f3e2b3a3135a55a60453742cb2c Mon Sep 17 00:00:00 2001 From: Moises E Jaramillo Date: Wed, 17 Jun 2026 16:20:58 -0400 Subject: [PATCH 2/2] =?UTF-8?q?docs(security):=20address=20PR=20#43=20revi?= =?UTF-8?q?ew=20=E2=80=94=20Finding=202=20tracking=20issue=20+=20Windows?= =?UTF-8?q?=20timer=20note?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Finding 2 (held-path network-DID-resolution timing residual) now has a dedicated tracking issue (#44); reference it from the ReceiveRejectionFloor doc, the DelayToRejectionFloorAsync LIMITATION note, and the CHANGELOG so an operator searching issues finds it (it was previously only in inline docs). - Document that Task.Delay rounds up to the Windows system timer resolution (~15 ms), so a sub-15 ms floor costs ~15 ms there — harmless to the security property, relevant to rejected-traffic throughput estimates. Co-Authored-By: Claude Opus 4.8 (1M context) --- CHANGELOG.md | 2 +- .../DidCommEndpointRouteBuilderExtensions.cs | 4 ++-- src/DidComm.AspNetCore/DidCommReceiveOptions.cs | 8 ++++++-- 3 files changed, 9 insertions(+), 5 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index b4f0d61..0bb57a6 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -34,7 +34,7 @@ caught — and the fix folds in — two further residuals. attacker-controlled, deliberately-slow `did:webvh` sender, making a held response visibly exceed the floor while an unheld one stays at it. That residual is network-bound and unbounded, so a fixed floor cannot close it without an absurd value; the API docs direct operators to authentication / a - rate-limiter in front of the receive endpoint (the deployment tradeoff #35 calls out). + rate-limiter in front of the receive endpoint (the deployment tradeoff #35 calls out). Tracked as #44. - **WebSocket receive is document-only.** The WS loops log-and-discard with nothing written back, so there is no per-message response to time; the floor is intentionally not applied there (noted in code). - **Root cause tracked upstream.** The local-crypto gap originates in `JweParser.Parse` / diff --git a/src/DidComm.AspNetCore/DidCommEndpointRouteBuilderExtensions.cs b/src/DidComm.AspNetCore/DidCommEndpointRouteBuilderExtensions.cs index dd19c35..885d2a0 100644 --- a/src/DidComm.AspNetCore/DidCommEndpointRouteBuilderExtensions.cs +++ b/src/DidComm.AspNetCore/DidCommEndpointRouteBuilderExtensions.cs @@ -578,8 +578,8 @@ private static async Task NormalizedReceiveRejectionAsync( /// deliberately-slow did:webvh sender) sees "response ≫ floor" for a held kid versus "response ≈ /// floor" for an unheld one. That residual is unbounded (network latency), so a fixed floor cannot /// close it without an absurd value; mitigate it with auth / a rate-limiter in front of the receive - /// endpoint (the deployment tradeoff the #35 issue calls out). The durable fix for the local-crypto - /// gap is constant-time recipient-kid selection in the JWE parser, tracked upstream. + /// endpoint (the deployment tradeoff the #35 issue calls out, tracked in #44). The durable fix for + /// the local-crypto gap is constant-time recipient-kid selection in the JWE parser, tracked upstream. /// /// internal static async Task DelayToRejectionFloorAsync(long startTimestamp, TimeSpan floor) diff --git a/src/DidComm.AspNetCore/DidCommReceiveOptions.cs b/src/DidComm.AspNetCore/DidCommReceiveOptions.cs index 9589835..2edd987 100644 --- a/src/DidComm.AspNetCore/DidCommReceiveOptions.cs +++ b/src/DidComm.AspNetCore/DidCommReceiveOptions.cs @@ -48,7 +48,10 @@ public sealed class DidCommReceiveOptions /// /// Keep this small (single-digit milliseconds): 5 ms already dominates the local-crypto spread /// with margin, and the wait is backed by a timer (it holds no thread). Defaults to 5 ms; set - /// to disable the floor entirely. + /// to disable the floor entirely. On Windows, + /// rounds up to the system timer resolution (~15 ms by default), so values below that are silently + /// floored to it — this does not weaken the security property (15 ms still dominates the µs-scale + /// crypto gap) but means rejected requests cost ~15 ms there rather than the configured 5 ms. /// /// What this does and does not close. The floor closes the cheap, universal probe — /// an unauthenticated peer sending garbage ciphertext to guessed recipient kids and timing the @@ -56,7 +59,8 @@ public sealed class DidCommReceiveOptions /// where a decryptable envelope triggers network DID resolution that runs longer than the floor /// (an attacker-controlled slow sender DID makes the held response visibly exceed the floor while /// an unheld one stays at it). That residual is network-bound and unbounded; close it with - /// authentication / a rate-limiter in front of the endpoint, not by inflating this value. + /// authentication / a rate-limiter in front of the endpoint, not by inflating this value — see the + /// tracking issue #44. /// public TimeSpan ReceiveRejectionFloor { get; set; } = TimeSpan.FromMilliseconds(5); }