Problem
The mesh tombstone e2e (source/end2end-tests/mesh/, driven by .github/workflows/e2e.yml)
was a required PR gate but is heavy and flaky on shared GitHub runners. On PR #2 it
failed two different ways:
- A crates.io download dropped mid-
docker build (curl … Broken pipe).
- The test timed out after 90s waiting for the tenants table to exist — i.e. the
tenants container hadn't finished migrations under a loaded runner
(tombstone_flow.rs await_table("tenants present")).
Neither is a code defect. The harness was only ever verified on podman locally;
GitHub Actions is its first CI home. It is currently de-gated (nightly schedule +
workflow_dispatch, not on the PR/merge path) — this issue tracks hardening it so it can
be re-promoted to a required check.
Work
Context
De-gated in commit 5f1cdf2 on PR #2. See .github/workflows/e2e.yml (the re-promotion
note), source/Makefile (e2e-* targets), and source/end2end-tests/mesh/README.md.
Problem
The mesh tombstone e2e (
source/end2end-tests/mesh/, driven by.github/workflows/e2e.yml)was a required PR gate but is heavy and flaky on shared GitHub runners. On PR #2 it
failed two different ways:
docker build(curl … Broken pipe).tenants container hadn't finished migrations under a loaded runner
(
tombstone_flow.rsawait_table("tenants present")).Neither is a code defect. The harness was only ever verified on podman locally;
GitHub Actions is its first CI home. It is currently de-gated (nightly schedule +
workflow_dispatch, not on the PR/merge path) — this issue tracks hardening it so it canbe re-promoted to a required check.
Work
make e2e-all, rundocker compose logs(all services) so a failure is debuggable. Right now thetrap … down -vtears the stack down with no logs captured.(e.g. a
/v1/healthpoll, or composehealthcheck+depends_on: service_healthyfor
tenants/ddns, not just the Postgres ones) instead of relying on the test'sown 90s poll from a cold start.
POLL_TIMEOUTfor a loaded shared runner (theimage build alone took ~13 min), or split build from run so the timeout only covers
the actual flow.
docker build is fast and resilient to transient registry drops.
e2e-meshleaf topr.yml(+ theall-checks-passedaggregatorneeds) andci.ymlonce it is reliably green.Context
De-gated in commit
5f1cdf2on PR #2. See.github/workflows/e2e.yml(the re-promotionnote),
source/Makefile(e2e-*targets), andsource/end2end-tests/mesh/README.md.