Skip to content

fix(ws): serialise session close against in-flight writes (spec-026 US3)#525

Merged
tcheeric merged 4 commits into
mainfrom
spec-026-us3-close-vs-write
May 26, 2026
Merged

fix(ws): serialise session close against in-flight writes (spec-026 US3)#525
tcheeric merged 4 commits into
mainfrom
spec-026-us3-close-vs-write

Conversation

@tcheeric
Copy link
Copy Markdown
Owner

Summary

Fixes the residual IllegalStateException: Concurrent write operations are not permitted observed during the spec-026 staging soak — a close-vs-write race inside nostr.client.springwebsocket.NostrRelayClient, below wallet-lib's adapter sendLock.

Root cause. NostrRelayClient wraps its session in Spring's ConcurrentWebSocketSessionDecorator but called the no-arg clientSession.close() (in close() and the send() timeout path). Spring 6.2.x's decorator overrides only close(CloseStatus) (guarded by closeLock); the no-arg close() falls through to WebSocketSessionDecorator.close()delegate.close() with no coordination, sending a CLOSE frame straight to the Tomcat delegate while a sendMessage flush is in flight. Tomcat permits a single write in flight, so the CLOSE frame (WsSession.sendCloseMessage) racing a data frame (WsRemoteEndpointImplBase.sendPartialString) throws.

Routing through close(CloseStatus) alone is insufficient: its closeLock is a separate lock from the flushLock that guards delegate.sendMessage, so it only narrows the window rather than closing it.

Fix. Introduce a ReentrantReadWriteLock session gate:

  • Every write (send / subscribe, via sendFrameGated) takes the read side — sends stay concurrent; the decorator still serialises the delegate writes among them.
  • Every close (closeGated / closeQuietly, always close(CloseStatus), never the no-arg close()) takes the write side, so it waits for all in-flight sends to drain before sending the CLOSE frame.
  • No lock nesting (release-before-acquire everywhere) → no added deadlock risk.

This preserves the existing decorator-concurrency design (and its tests) while fully closing the race.

Changes

  • NostrRelayClient: sessionGate RW-lock; sendFrameGated / closeGated / closeQuietly / clearPendingRequest helpers; send, subscribe, the send() timeout path, and close() rerouted through them.
  • CHANGELOG.md: [Unreleased] → Fixed entry.

Test plan

  • New NostrRelayClientCloseWriteRaceTest — (1) close() routes through close(CloseStatus) and never the no-arg close(); (2) the send() timeout path routes the same way; (3) latch-based serialisation proof: a close blocks on the gate while a send is in flight, so the delegate never sees a write and a close at once.
  • Updated NostrRelayClientTimeoutTest + SpringWebSocketClientTest (which asserted the old buggy no-arg close()) to assert close(any(CloseStatus)) + never().close().
  • mvn -pl nostr-java-client -am test → 24/24 pass. Concurrency + overflow tests preserved (overflow now ~13s because close correctly waits for the deliberately-stuck in-flight send).

🤖 Generated with Claude Code

NostrRelayClient called the no-arg clientSession.close() in close() and the
send() timeout path. Spring 6.2.x's ConcurrentWebSocketSessionDecorator
overrides only close(CloseStatus) (guarded by closeLock); the no-arg close()
falls through to delegate.close() with no coordination, racing an in-flight
sendMessage flush on the Tomcat delegate (single-in-flight-write →
IllegalStateException: Concurrent write operations are not permitted). This
was the residual concurrent-write leakage observed during the spec-026 soak,
a layer below wallet-lib's adapter sendLock.

Routing through close(CloseStatus) alone is insufficient: its closeLock is a
separate lock from the flushLock guarding delegate.sendMessage. Introduce a
ReentrantReadWriteLock session gate — every write (send/subscribe via
sendFrameGated) takes the read side (sends stay concurrent; the decorator
still serialises the delegate writes among them), every close (closeGated/
closeQuietly, always close(CloseStatus)) takes the write side and waits for
in-flight sends to drain before sending the CLOSE frame. No lock nesting, so
no added deadlock risk.

Tests: new NostrRelayClientCloseWriteRaceTest (routing through CloseStatus +
never no-arg close; timeout-path routing; latch-based serialisation proof).
Updated NostrRelayClientTimeoutTest and SpringWebSocketClientTest, which
asserted the old no-arg close(), to assert close(any(CloseStatus)) +
never().close(). 24/24 nostr-java-client tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a WebSocket close-vs-write race inside nostr.client.springwebsocket.NostrRelayClient by ensuring session closes are serialized against in-flight writes, avoiding Tomcat’s “Concurrent write operations are not permitted” IllegalStateException.

Changes:

  • Added a read/write “session gate” to block close(CloseStatus) until all in-flight sendMessage calls drain, and routed all closes through close(CloseStatus) (never the no-arg close()).
  • Refactored send() / subscribe() to use gated send + best-effort gated close helpers for timeout/overflow paths.
  • Updated and added tests to assert close routing and demonstrate non-overlap between send and close, plus a changelog entry.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
nostr-java-client/src/main/java/nostr/client/springwebsocket/NostrRelayClient.java Introduces RW-lock gating for close vs writes; reroutes send/subscribe/timeout/overflow closure through gated helpers.
nostr-java-client/src/test/java/nostr/client/springwebsocket/SpringWebSocketClientTest.java Updates timeout expectations to verify close(CloseStatus) is used and close() is never called.
nostr-java-client/src/test/java/nostr/client/springwebsocket/NostrRelayClientTimeoutTest.java Aligns timeout close assertions with the new close(CloseStatus) routing and ensures session is open for the close path.
nostr-java-client/src/test/java/nostr/client/springwebsocket/NostrRelayClientCloseWriteRaceTest.java Adds regression coverage for close routing and close-vs-write serialization (delegate never sees overlap).
CHANGELOG.md Adds an [Unreleased] “Fixed” entry describing the race and the locking-based resolution.
Comments suppressed due to low confidence (2)

nostr-java-client/src/test/java/nostr/client/springwebsocket/NostrRelayClientCloseWriteRaceTest.java:173

  • These comments attribute close() blocking to sendLock, but the blocking mechanism in the fix is the sessionGate write lock waiting for an in-flight sessionGate read lock held by sendFrameGated(). Updating the comments will keep the test explanation aligned with what it actually proves.
    // With the fix, close() blocks on sendLock (held by the parked subscribe), so
    // the delegate close has NOT run yet — exactly one op (the write) in flight.
    awaitThreadParked(closer, 2_000);
    assertEquals(1, delegateOps.get(),
        "close() must not reach the delegate while a send is in flight on it");

nostr-java-client/src/test/java/nostr/client/springwebsocket/NostrRelayClientCloseWriteRaceTest.java:208

  • The helper Javadoc mentions the thread is blocked acquiring sendLock, but in the current implementation close() waits on the sessionGate write lock (not sendLock). Update the wording so future maintainers don’t chase the wrong lock when debugging this test.
  /**
   * Poll until {@code thread} is parked (BLOCKED / WAITING / TIMED_WAITING) — i.e.
   * blocked acquiring sendLock — or the timeout elapses.
   */

tcheeric and others added 3 commits May 26, 2026 02:21
Cuts 2.0.4 with the spec-026 US3 close-vs-write fix. Moves [Unreleased]
changelog content into the 2.0.4 section.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address Copilot review on PR #525:
- send(): clear pendingRequest on ANY sendFrameGated failure (not only
  SessionLimitExceededException). A plain IOException previously left the
  client stuck "in flight", breaking the next send()/@NostrRetryable retry.
- closeQuietly(): log the swallowed throwable (+ relayUri) instead of just
  e.getMessage(), preserving the stack trace.
- tests: correct stale sendLock references to the sessionGate RW-lock; narrow
  the timeout assertThrows from Exception to RelayTimeoutException.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cuts 2.0.5 with the PR #525 review fixes: send() clears pendingRequest on
any send failure (not just overflow) so a failed write can't wedge the
in-flight slot, and closeQuietly() logs the throwable + relay URI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tcheeric tcheeric merged commit 974afdd into main May 26, 2026
6 checks passed
@tcheeric tcheeric deleted the spec-026-us3-close-vs-write branch May 26, 2026 02:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants