Skip to content

Harden the mesh e2e harness for CI, then re-promote it to a required check #4

Description

@pedromvgomes

Problem

The mesh tombstone e2e (source/end2end-tests/mesh/, driven by .github/workflows/e2e.yml)
was a required PR gate but is heavy and flaky on shared GitHub runners. On PR #2 it
failed two different ways:

  1. A crates.io download dropped mid-docker build (curl … Broken pipe).
  2. The test timed out after 90s waiting for the tenants table to exist — i.e. the
    tenants container hadn't finished migrations under a loaded runner
    (tombstone_flow.rs await_table("tenants present")).

Neither is a code defect. The harness was only ever verified on podman locally;
GitHub Actions is its first CI home. It is currently de-gated (nightly schedule +
workflow_dispatch, not on the PR/merge path) — this issue tracks hardening it so it can
be re-promoted to a required check.

Work

  • Dump diagnostics on failure — on a failed make e2e-all, run
    docker compose logs (all services) so a failure is debuggable. Right now the
    trap … down -v tears the stack down with no logs captured.
  • Explicit readiness gating before the test — wait for each service to be healthy
    (e.g. a /v1/health poll, or compose healthcheck + depends_on: service_healthy
    for tenants/ddns, not just the Postgres ones) instead of relying on the test's
    own 90s poll from a cold start.
  • CI-appropriate timeouts — size POLL_TIMEOUT for a loaded shared runner (the
    image build alone took ~13 min), or split build from run so the timeout only covers
    the actual flow.
  • Optionally pre-build / cache the service images (cargo-chef layer caching) so the
    docker build is fast and resilient to transient registry drops.
  • Re-promote to a required check: re-add the e2e-mesh leaf to pr.yml (+ the
    all-checks-passed aggregator needs) and ci.yml once it is reliably green.

Context

De-gated in commit 5f1cdf2 on PR #2. See .github/workflows/e2e.yml (the re-promotion
note), source/Makefile (e2e-* targets), and source/end2end-tests/mesh/README.md.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions