Skip to content

Long-term: fold boot-node into core for in-process mockability #37

Description

@chefsale

Problem

There is no in-process way to mock the boot-node from a test in calimero-network/core. As a result, code paths that depend on observing real relay-server behaviour — reservation renewal, expiry, denial, control-connection lifecycle, AutoNAT interactions — are only exercised by the live boot-node deployed in dev. Bugs in these paths surface as production incidents, not as failing CI.

A concrete recent example: a recovery path for lost relay reservations (calimero-network/core#2446) had to be diagnosed from user-reported "I have to restart the app to find peers again" logs, because no test in core could reproduce the relay-side conditions (control connection drop, expiry beyond max_circuit_duration, listener closure). Even after the fix, the unit tests cover only the state machine — the event-loop wiring is verified by running against a real boot-node.

Long-term proposal: move boot-node into core

Fold this repo into calimero-network/core as bin/boot-node (binary) + crates/boot-node (library), with the library exposing the relay/identify/rendezvous configuration as a callable function. Benefits:

  • The boot-node binary's deps and config travel with the same workspace Cargo.lock that core depends on. The libp2p version mismatch class of bug becomes impossible.
  • Core integration tests can spawn the real boot-node crates/boot-node library in-process with custom config (short max_circuit_duration, low max_reservations_per_peer, etc.) to write tests against the actual production code path, not a mock.
  • The Packer + Atlantis pipeline that currently rebuilds this repo's AMI can be retargeted at the new path with no behavioural change.
  • Single SemVer for the network stack. Configuration changes (like the recent max_circuit_duration: 3600s bump) land atomically with the client-side changes that depend on them.

Interim plan (already started in core)

A MockRelay testkit will land in crates/network/tests/common/ in core. It spawns a libp2p server-side relay::Behaviour with knobs for the parameters we care about, plus shutdown/respawn for fault injection. This is enough to write integration tests for the recovery path that ships in #2446 and for the address-book hygiene work that follows it.

The mock will diverge from the real boot-node over time (it won't grow Prometheus metrics, the bootstrap-config UX, etc.). When this repo is folded into core, the mock should be retired in favor of running the real boot-node crate with test config.

Migration sketch

  1. git subtree add (or equivalent) this repo into calimero-network/core at bin/boot-node/.
  2. Split bin/boot-node/src/main.rs into a lib.rs that exposes run(config) and a main.rs that parses CLI args and calls it.
  3. Move the relay/identify/rendezvous config builder into crates/boot-node/ so it can be reused by tests with custom overrides.
  4. Update Packer (infrastructure/packer/aws/boot-node/) and Atlantis (infrastructure/terraform/.../boot_node.tf) to point at the new source path.
  5. Archive this repo with a pointer to the new location.

Acceptance criteria

  • Decision made on whether to proceed with the move (or to stay separate and accept the mock divergence).
  • If proceeding: tracking issue in calimero-network/core for the actual move.
  • If staying separate: documented compatibility contract between this repo's relay::Config and core's MockRelay defaults, so tests don't drift from production.

Out of scope

  • The actual code move. This issue is about the decision, not the execution.
  • Operational concerns (release cadence, third-party deploys) — none of these apply today; called out for completeness.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions