Fix stuck faststream-stomp consumers (ping=True but no progress)#196
Merged
Conversation
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Catching BaseException would swallow SystemExit, KeyboardInterrupt, and GeneratorExit raised inside a handler, which should propagate. Handlers that legitimately fail raise Exception subclasses; only those need to be contained at the listener boundary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The note pointing to PR #117 discussion is still relevant after the logging change — the SystemExit branch remains hard to test directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d4ba5b2 to
0e36964
Compare
Collaborator
|
Perhaps, we should set a sensible default for max_concurrent_handlers? |
Bound in-flight handler tasks by default so a stuck handler can no longer accumulate unbounded create_task entries. 100 is high enough that fast-handler workloads never hit it; the cap matters only when handlers are slow, where it makes the stuck state detectable: the read loop pauses, last_read_time goes stale, and is_alive turns False. Set max_concurrent_handlers=None to restore the prior unbounded behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
In production, faststream-stomp consumers stop processing messages while
broker.ping()keeps returningTrue. Queue depth grows on the broker (ActiveMQ Artemis), the DLQ also grows, and only a pod restart recovers.This PR addresses every failure mode that can lead to that state:
TaskGroupand silently kill the listen task.is_alivegrace fix —ActiveConnectionStatenow tracksconnected_at; whenlast_read_time is None, aliveness is bounded by the sameserver_heartbeat × factorthreshold instead of returningTrueindefinitely. Closes the indefinite-True window after a reconnect where nothing reads.is_alive—Client.is_alive()returnsFalseonce_listen_task.done(), so dead listeners can no longer report healthy._handle_listen_task_donenow logs every non-cancelled, non-FailedAllConnectAttemptsErrorexception at ERROR with the traceback. Previously these were silently swallowed.max_concurrent_handlersknob — opt-in semaphore (defaultNone/unbounded) bounds in-flight handler tasks. When the semaphore is full the read loop pauses, which (combined with theis_alivefix above) converts an invisible stuck-state into a detectable one.maybe_write_frameare also logged with severity by frame type (NACK=ERROR, ACK=WARNING, other=INFO) so DLQ growth becomes diagnosable from logs alone.Test plan
just lintjust check-typesjust test-fast— 244 passed / 1 skippedjust test— full suite (requires Docker brokers); run before mergeNotes for reviewers
max_concurrent_handlersisNoneby default; existing call sites do not need changes.