Skip to content

Connection timeout in custom test using UnboundBuffer with TCP_LAZY transport #444

@piotrchmiel

Description

@piotrchmiel

Description:

We’ve written a test to verify the behavior of Gloo's UnboundBuffer when it is destroyed shortly after a send() operation. The test is based on TCP_LAZY transport and context size of 2. Despite the simplicity of the setup, the test randomly fails due to a connection timeout (gloo::IoException). This suggests a possible timing or synchronization issue during peer connection setup in the TCP backend.

Test code:

TEST_F(BaseTest, DestroyingBuffer) {
  const auto transport = TCP_LAZY;
  const auto contextSize = 2;

  spawn(transport, contextSize, [&](const std::shared_ptr<Context> &context) {
    const auto rank = static_cast<size_t>(context->rank);
    if (rank == 0LU)
      return;

    using BufferPtr = std::unique_ptr<::gloo::transport::UnboundBuffer>;
    std::vector<int> storage = { context->rank };
    BufferPtr buffer =
        context->createUnboundBuffer(storage.data(), sizeof(int));

    ASSERT_NO_THROW({
      for (auto i = 0LU; i < contextSize; i++) {
        if (i == rank)
          continue;
        buffer->send(static_cast<int>(i), rank);
      }
      buffer.reset();
    });
  });
}

Behavior:
When it passes, the test completes in ~20ms, with both ranks attempting to connect and send as expected.

When it fails, it hangs for a long time (150s) before throwing:
gloo::IoException: [/path/to/gloo/transport/tcp/pair.h:303] Connect timeout [none]
This happens despite running the test multiple times consecutively with no code changes.

Environment:
Gloo commit: fe67c4b

Transport: TCP_LAZY

Context size: 2

Platform: Linux Ubuntu 24.04

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions