Problem
There is no in-process way to mock the boot-node from a test in calimero-network/core. As a result, code paths that depend on observing real relay-server behaviour — reservation renewal, expiry, denial, control-connection lifecycle, AutoNAT interactions — are only exercised by the live boot-node deployed in dev. Bugs in these paths surface as production incidents, not as failing CI.
A concrete recent example: a recovery path for lost relay reservations (calimero-network/core#2446) had to be diagnosed from user-reported "I have to restart the app to find peers again" logs, because no test in core could reproduce the relay-side conditions (control connection drop, expiry beyond max_circuit_duration, listener closure). Even after the fix, the unit tests cover only the state machine — the event-loop wiring is verified by running against a real boot-node.
Long-term proposal: move boot-node into core
Fold this repo into calimero-network/core as bin/boot-node (binary) + crates/boot-node (library), with the library exposing the relay/identify/rendezvous configuration as a callable function. Benefits:
- The boot-node binary's deps and config travel with the same workspace
Cargo.lock that core depends on. The libp2p version mismatch class of bug becomes impossible.
- Core integration tests can spawn the real boot-node
crates/boot-node library in-process with custom config (short max_circuit_duration, low max_reservations_per_peer, etc.) to write tests against the actual production code path, not a mock.
- The Packer + Atlantis pipeline that currently rebuilds this repo's AMI can be retargeted at the new path with no behavioural change.
- Single SemVer for the network stack. Configuration changes (like the recent
max_circuit_duration: 3600s bump) land atomically with the client-side changes that depend on them.
Interim plan (already started in core)
A MockRelay testkit will land in crates/network/tests/common/ in core. It spawns a libp2p server-side relay::Behaviour with knobs for the parameters we care about, plus shutdown/respawn for fault injection. This is enough to write integration tests for the recovery path that ships in #2446 and for the address-book hygiene work that follows it.
The mock will diverge from the real boot-node over time (it won't grow Prometheus metrics, the bootstrap-config UX, etc.). When this repo is folded into core, the mock should be retired in favor of running the real boot-node crate with test config.
Migration sketch
git subtree add (or equivalent) this repo into calimero-network/core at bin/boot-node/.
- Split
bin/boot-node/src/main.rs into a lib.rs that exposes run(config) and a main.rs that parses CLI args and calls it.
- Move the relay/identify/rendezvous config builder into
crates/boot-node/ so it can be reused by tests with custom overrides.
- Update Packer (
infrastructure/packer/aws/boot-node/) and Atlantis (infrastructure/terraform/.../boot_node.tf) to point at the new source path.
- Archive this repo with a pointer to the new location.
Acceptance criteria
Out of scope
- The actual code move. This issue is about the decision, not the execution.
- Operational concerns (release cadence, third-party deploys) — none of these apply today; called out for completeness.
Problem
There is no in-process way to mock the boot-node from a test in
calimero-network/core. As a result, code paths that depend on observing real relay-server behaviour — reservation renewal, expiry, denial, control-connection lifecycle, AutoNAT interactions — are only exercised by the live boot-node deployed in dev. Bugs in these paths surface as production incidents, not as failing CI.A concrete recent example: a recovery path for lost relay reservations (calimero-network/core#2446) had to be diagnosed from user-reported "I have to restart the app to find peers again" logs, because no test in core could reproduce the relay-side conditions (control connection drop, expiry beyond
max_circuit_duration, listener closure). Even after the fix, the unit tests cover only the state machine — the event-loop wiring is verified by running against a real boot-node.Long-term proposal: move boot-node into core
Fold this repo into
calimero-network/coreasbin/boot-node(binary) +crates/boot-node(library), with the library exposing the relay/identify/rendezvous configuration as a callable function. Benefits:Cargo.lockthat core depends on. The libp2p version mismatch class of bug becomes impossible.crates/boot-nodelibrary in-process with custom config (shortmax_circuit_duration, lowmax_reservations_per_peer, etc.) to write tests against the actual production code path, not a mock.max_circuit_duration: 3600sbump) land atomically with the client-side changes that depend on them.Interim plan (already started in core)
A
MockRelaytestkit will land incrates/network/tests/common/in core. It spawns a libp2p server-siderelay::Behaviourwith knobs for the parameters we care about, plus shutdown/respawn for fault injection. This is enough to write integration tests for the recovery path that ships in #2446 and for the address-book hygiene work that follows it.The mock will diverge from the real boot-node over time (it won't grow Prometheus metrics, the bootstrap-config UX, etc.). When this repo is folded into core, the mock should be retired in favor of running the real boot-node crate with test config.
Migration sketch
git subtree add(or equivalent) this repo intocalimero-network/coreatbin/boot-node/.bin/boot-node/src/main.rsinto alib.rsthat exposesrun(config)and amain.rsthat parses CLI args and calls it.crates/boot-node/so it can be reused by tests with custom overrides.infrastructure/packer/aws/boot-node/) and Atlantis (infrastructure/terraform/.../boot_node.tf) to point at the new source path.Acceptance criteria
calimero-network/corefor the actual move.relay::Configand core'sMockRelaydefaults, so tests don't drift from production.Out of scope