Skip to content

Fix Basic.Return content frame leak#2

Open
erikshestopal wants to merge 2 commits into
Fuyukai:mainfrom
erikshestopal:fix/basic-return-content
Open

Fix Basic.Return content frame leak#2
erikshestopal wants to merge 2 commits into
Fuyukai:mainfrom
erikshestopal:fix/basic-return-content

Conversation

@erikshestopal
Copy link
Copy Markdown

@erikshestopal erikshestopal commented Jun 4, 2026

Summary

Fixes a connection-wide deadlock caused by leaked Basic.Return content frames.

When RabbitMQ returns an unroutable mandatory publish, it sends a content-bearing return:

  1. Basic.Return
  2. content header
  3. zero or more body frames
  4. publisher-confirm Basic.Ack / Basic.Nack when confirms are enabled

Serena handled the Basic.Return method frame, but routed the following header/body frames into the channel delivery buffer. A publish-only channel never drains that buffer. Repeated returned publishes can fill it, causing the single connection-wide reader to block while trying to enqueue another frame. Once that happens, every channel on the connection stops receiving frames.

How to reproduce the bug

Run RabbitMQ locally:

docker run --rm -p 5672:5672 rabbitmq:3

Then publish mandatory messages to a missing queue on one channel with a small channel buffer:

import anyio

from serena import open_connection
from serena.exc import MessageReturnedError


async def main():
    async with open_connection("127.0.0.1", port=5672, channel_buffer_size=8) as conn:
        async with conn.open_channel() as channel:
            for index in range(8):
                try:
                    await channel.basic_publish(
                        "",
                        routing_key=f"missing-{index}",
                        body=b"x",
                        mandatory=True,
                    )
                except MessageReturnedError:
                    pass

                print(index, channel.current_buffer_size)

            async with conn.open_channel() as other_channel:
                await other_channel.queue_declare(name="", exclusive=True)


anyio.run(main)

Before this fix, channel.current_buffer_size grows as returned content frames accumulate. With a small buffer, the connection reader eventually logs:

!!! CHANNEL n IS ABOUT TO BLOCK: This WILL cause a deadlock !!!

At that point, later operations on any channel can hang because frame processing is blocked at the connection reader.

This is reachable with Serena's default channel_buffer_size=48: a non-empty returned publish leaks at least two frames, so one publish-only channel can deadlock after roughly 24 returned publishes.

Root cause

The connection reader split one returned message across two internal queues:

  • Basic.Return went to the regular method stream.
  • The return's content header/body frames went to the delivery stream.

The delivery stream is bounded and is only drained by consume / basic_get paths. A publisher-only channel has no delivery consumer, so those frames remain buffered forever.

Fix

This PR adds a small shared content assembly state machine for content-bearing AMQP methods.

The same method/header/body shape is used by:

  • Basic.Deliver
  • Basic.GetOk
  • Basic.Return

The new assembler validates the header class id, handles empty bodies, and combines multi-frame bodies. Existing delivery/get code now uses that assembler instead of maintaining its own inline reconstruction loop.

For Basic.Return, the connection reader now tracks pending return content by channel id. It consumes the return header/body frames before they can enter the delivery buffer. Once the returned message is complete, the original Basic.Return is surfaced to basic_publish as before, but MessageReturnedError now also includes:

  • header
  • body

That preserves the existing error contract while exposing the returned message content and preventing the deadlock.

Tests

Added regression and characterization coverage for:

  • Basic.GetEmpty
  • Basic.GetOk with empty and multi-frame bodies
  • Basic.Deliver with empty and multi-frame bodies
  • returned content not entering the delivery buffer
  • repeated returns not blocking the connection reader
  • returned content on one channel not interfering with delivery on another channel
  • broker-backed returned publish preserving returned header/body on MessageReturnedError

Validated locally against RabbitMQ:

poetry run pytest -q --no-cov
poetry run pyright src/serena
poetry run ruff check src/serena tests/channel/test_publish.py

Results:

72 passed, 10 deselected
pyright: 0 errors
ruff: All checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant