Skip to content

Redesign NUClearNet and rename to nuclearnet#190

Open
TrentHouliston wants to merge 18 commits into
mainfrom
houliston/nuclearnet-v2
Open

Redesign NUClearNet and rename to nuclearnet#190
TrentHouliston wants to merge 18 commits into
mainfrom
houliston/nuclearnet-v2

Conversation

@TrentHouliston
Copy link
Copy Markdown
Member

@TrentHouliston TrentHouliston commented May 26, 2026

Summary

Replaces the monolithic NUClearNetwork implementation in src/extension/network/ with a redesigned, modular src/nuclearnet/ library. The new implementation is built as a standalone library that can be used independently of the reactor framework.

Improvements over the old NUClearNetwork

  • Modular architecture — The old implementation was a single ~1100-line class handling discovery, fragmentation, reliability, and routing all in one place. The new version separates each concern into its own class with clear interfaces.
  • Standalone librarynuclearnet can now be built and linked independently of the NUClear reactor framework, enabling reuse in other projects.
  • Selective ACK/NACK — The old implementation used simple timeout-based retransmission. The new version uses bitset-based selective acknowledgement so only missing fragments are retransmitted, not entire packets.
  • Adaptive retransmission — Per-peer RTT estimation (Jacobson/Karels algorithm) replaces fixed timeouts, adapting to actual network conditions.
  • Packet deduplication — Sliding-window deduplication prevents processing the same packet twice, which the old version did not handle.
  • Scatter-gather IO — Fragment transmission uses vectored writes to avoid copying payload data into intermediate buffers.
  • Testability — Each component has dedicated unit tests. The old implementation had no unit-level test coverage of its networking internals.

Architecture

The new NUClearNet is decomposed into focused components:

  • Discovery — Multicast-based peer announce/leave with two-flag connection model
  • Fragmentation — Splits large payloads into MTU-sized fragments and reassembles them with timeout-based cleanup
  • Reliability — Selective ACK/NACK retransmission using bitset-based tracking
  • RTTEstimator — Per-peer round-trip time estimation for adaptive retransmission timeouts
  • Routing — Subscription-based message filtering per peer, with receiver-side filtering for multicast data
  • PacketDeduplicator — Sliding-window deduplication to discard already-seen packets
  • wire_protocol — Defines the on-wire packet format (Announce, Leave, Data, ACK, Connect)

NUClearNet ties these together to provide the public API (join, leave, send, process).

Connection establishment

Peers must satisfy two independent conditions before a connection is considered "up":

  1. Announce path confirmed (announce_heard) — received the peer's announce on the multicast/broadcast channel, proving their data port can reach our announce address.
  2. Data handshake confirmed (handshake == CONFIRMED) — a 3-way SYN/SYN+ACK/ACK handshake over the data ports proves bidirectional unicast connectivity.

The packet type encodes which path was used — ANNOUNCE packets always go to the multicast group, CONNECT packets always go to the peer's data port. This confirms all four communication paths without needing socket tracking.

When a new peer's announce is heard:

  • We immediately re-announce to the multicast group (so they hear us)
  • We send CONNECT(SYN) to their data port to begin the data handshake
  • Connection fires only when both flags are satisfied (either order)

Multicast broadcast delivery

Unreliable broadcast sends (empty target, non-reliable) are sent once to the multicast/broadcast group rather than unicasting to each peer individually. Receivers filter by local subscription before fragmentation reassembly.

Reliable sends and targeted sends remain unicast for per-peer ACK tracking.

What Changed

  • Removed: src/extension/network/NUClearNetwork.{cpp,hpp} and src/extension/network/wire_protocol.hpp (old monolithic implementation)
  • Added: src/nuclearnet/ — all new source and headers with its own CMakeLists.txt
  • Added: src/util/network/sock_t.hpp — cross-platform socket type alias
  • Modified: src/extension/NetworkController.{cpp,hpp} — updated to use the new nuclearnet library API
  • Added: tests/tests/nuclearnet/ — Catch2 BDD-style unit tests for each component (Discovery, Fragmentation, Integration, PacketDeduplicator, RTTEstimator, Reliability, Routing, wire_protocol)

Build

  • Targets C++14 (no C++17 features used)
  • nuclearnet library links against the platform socket library and can be consumed standalone via CMake
  • Full test suite passes locally (68/68) and across Docker CI matrix (GCC 7–13, Clang 18)

… and Fragmentation

Add an optional 'now' parameter (defaulting to steady_clock::now()) to all
time-dependent methods in Discovery, Reliability, and Fragmentation. This
allows tests to advance time deterministically without sleeping, making them
faster, more reliable, and immune to CI timing variability.

Tests converted from time-based (sleep_for) to event-based:
- Discovery: check_timeouts and touch_peer tests
- Reliability: all retransmission/timeout tests (7 sleeps removed)
- Fragmentation: cleanup_expired test

The only remaining sleep_for is in Integration.cpp's polling loop for
genuine async UDP networking, which is inherently time-dependent.
The has_ipv4_multicast() and has_ipv6_multicast() functions previously
only checked if network interfaces reported the IFF_MULTICAST flag.
On macOS GitHub Actions runners (virtualized ARM64 VMs), interfaces
report multicast capability but the hypervisor doesn't actually deliver
multicast packets.

Now performs an actual multicast send/receive round-trip test with a
200ms timeout. This correctly detects broken multicast environments
and causes those tests to be skipped rather than hanging.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces the legacy monolithic NUClearNetwork implementation under src/extension/network/ with a new modular, standalone src/nuclearnet/ library, and updates the reactor-facing NetworkController to use the new API. It also adds a comprehensive Catch2 unit/integration test suite for the new components and improves multicast capability detection for test gating.

Changes:

  • Introduces the new src/nuclearnet/ standalone library (Discovery, Fragmentation, Reliability, RTT estimation, routing, deduplication, wire protocol, RAII FD wrapper).
  • Migrates src/extension/NetworkController from NUClearNetwork to the new NUClearNet API, including subscription propagation.
  • Adds/updates tests and test utilities (including multicast round-trip detection) to cover the new networking components.

Reviewed changes

Copilot reviewed 33 out of 33 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
src/nuclearnet/CMakeLists.txt Adds standalone CMake build for the new nuclearnet library.
src/nuclearnet/wire_protocol.hpp Defines new on-wire structs and header validation helper.
src/nuclearnet/RTTEstimator.hpp Declares RTT estimator API.
src/nuclearnet/RTTEstimator.cpp Implements Jacobson/Karels-style RTT estimation.
src/nuclearnet/Routing.hpp Declares peer/local subscription tracking and filtering API.
src/nuclearnet/Routing.cpp Implements subscription-based routing decisions.
src/nuclearnet/Reliability.hpp Declares ACK/NACK tracking and retransmission API.
src/nuclearnet/Reliability.cpp Implements retransmission tracking and ACK/NACK processing.
src/nuclearnet/PacketDeduplicator.hpp Declares sliding-window packet deduplication.
src/nuclearnet/PacketDeduplicator.cpp Implements wraparound-safe sliding-window deduplication.
src/nuclearnet/Fragmentation.hpp Declares fragmentation and reassembly API.
src/nuclearnet/Fragmentation.cpp Implements MTU fragmentation, reassembly, and expiry cleanup.
src/nuclearnet/FileDescriptor.hpp Adds RAII wrapper for sockets/file descriptors.
src/nuclearnet/Discovery.hpp Declares announce/leave processing and peer tracking API.
src/nuclearnet/Discovery.cpp Implements peer discovery, timeout handling, and callbacks.
src/nuclearnet/NUClearNet.hpp Declares the public standalone networking façade and callbacks.
src/nuclearnet/NUClearNet.cpp Implements socket setup, polling loop, packet IO, and module integration.
src/util/network/sock_t.hpp Extends sock_t with comparison and stream operators for use in maps/logging.
src/extension/NetworkController.hpp Switches controller to use network::NUClearNet.
src/extension/NetworkController.cpp Adapts controller wiring to the new API and subscriptions model.
src/CMakeLists.txt Uses CONFIGURE_DEPENDS for recursive globbing.
tests/test_util/has_multicast.cpp Adds multicast round-trip probing for more reliable test gating.
tests/tests/nuclearnet/Discovery.cpp Adds unit tests for discovery behaviors.
tests/tests/nuclearnet/Fragmentation.cpp Adds unit tests for fragmentation/reassembly behaviors.
tests/tests/nuclearnet/Integration.cpp Adds integration tests for two peers discovering/exchanging data.
tests/tests/nuclearnet/PacketDeduplicator.cpp Adds unit tests for deduplication window/wraparound.
tests/tests/nuclearnet/Reliability.cpp Adds unit tests for ACK/NACK and retransmission behavior.
tests/tests/nuclearnet/Routing.cpp Adds unit tests for routing/subscription filtering.
tests/tests/nuclearnet/RTTEstimator.cpp Adds unit tests for RTT estimator behavior.
tests/tests/nuclearnet/wire_protocol.cpp Adds tests for packed sizes/layout and header validation.
src/extension/network/NUClearNetwork.hpp Removes legacy network header.
src/extension/network/NUClearNetwork.cpp Removes legacy network implementation.
src/extension/network/wire_protocol.hpp Removes legacy wire protocol header.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/nuclearnet/NUClearNet.cpp Outdated
Comment thread src/nuclearnet/NUClearNet.cpp
Comment thread src/nuclearnet/NUClearNet.cpp
Comment thread src/nuclearnet/NUClearNet.cpp
Comment thread src/nuclearnet/Reliability.cpp
Comment thread src/nuclearnet/Discovery.cpp Outdated
Comment thread src/nuclearnet/wire_protocol.hpp Outdated
Comment thread src/util/network/sock_t.hpp
Comment thread tests/test_util/has_multicast.cpp Outdated
Comment thread src/nuclearnet/wire_protocol.hpp
- FileDescriptor.hpp: Add const to local variable (misc-const-correctness)
- NUClearNet.hpp: Value-initialize iovec, add NOLINT for necessary
  const_cast, remove redundant member initializer
- NetworkController.cpp: Add missing direct includes for string, set,
  Discovery.hpp, and NUClearNet.hpp (misc-include-cleaner)
- TestRunner.cmake: Add --skip-returncode 0 so Catch2 returns success
  when all tests are skipped (fixes macOS CI where multicast is unavailable)
v5 is deprecated and has a known security vulnerability. The scanner was
failing with HTTP 403 when querying JRE metadata, likely due to the old
action version being unsupported by SonarCloud's API.
Build fixes:
- Remove --skip-returncode (not supported in Catch2 v3.6.0)
- sock_t.hpp: Add #include <string> for std::to_string/stoi (Windows)
- Discovery.cpp: Fix include-cleaner errors and const-correctness

Review feedback fixes:
- NUClearNet.cpp: Fix unreachable new-peer branch by checking before
  process_announce adds the peer
- Reliability.cpp: Validate ACK packet_count matches tracked packet
- Discovery.cpp: Copy PeerInfo before invoking callbacks to avoid
  holding peers_mutex during user callbacks (deadlock risk)
- wire_protocol.hpp: Fix ACK comment (10+ bytes, not 11+) and remove
  incorrect 'network byte order' claim from announce doc
- has_multicast.cpp: Verify received payload matches sent message to
  avoid false positives from unrelated multicast traffic
Update all networking documentation to accurately describe the v2
implementation:

- Protocol version 0x03 (was incorrectly documented as 0x02)
- Modular architecture: Discovery, Fragmentation, Reliability, Routing,
  PacketDeduplicator, RTTEstimator
- Jacobson/Karels RTT estimation (RFC 6298), not Kalman filter
- Subscription-based routing (announce packets include type hashes)
- Sliding-window packet deduplication (256 IDs per peer)
- NAT-friendly port learning from UDP source address
- Assembly size limits to prevent memory bombs
- Configurable peer timeout and max retransmission attempts
Remove max_retransmits limit from Reliability module. Reliable messages
now retransmit indefinitely based on RTT-estimated timeouts until either:
- All fragments are ACKed (success)
- The peer is removed due to timeout or graceful leave (connection lost)

This provides true reliable delivery semantics — if the connection is
alive, delivery is guaranteed.

Also tie assembly timeout to peer_timeout (default 2s) instead of a
fixed 10 seconds. The assembly timeout is now the natural bound: if no
fragments arrive within the peer timeout period, the peer is either dead
or the sender has moved on. For reliable messages, retransmissions keep
the assembly alive as long as the sender is connected.
…e-back

Since protocol v0.03 is already a breaking change, simplify the wire
protocol by removing the DATA_RETRANSMISSION type. The receiver now
checks the packet deduplicator for ALL incoming DATA packets. If the
packet group was already fully processed, an ACK is sent and the
fragment is discarded. This provides the same behavior with one fewer
packet type.

Packet type numbering is now:
  ANNOUNCE=1, LEAVE=2, DATA=3, ACK=4, NACK=5

Also document the announce-back behavior: when a node hears an announce
from an unknown peer, it immediately sends its own announce via unicast
to that peer. This gives instant bidirectional connection without
waiting for the next announce cycle.
/// This node's name on the network
std::string name;
/// The multicast/broadcast/unicast address to announce on
std::string announce_address = "239.226.152.162";
NACK was never sent by any code path — build_nack_packet existed but
was never called in production. The bitset ACK already implicitly
communicates which fragments are missing (bit=0), making an explicit
NACK redundant.

Packet types are now: ANNOUNCE=1, LEAVE=2, DATA=3, ACK=4
Connection establishment now requires two independent conditions:
- announce_heard: peer's announce received on multicast channel
- handshake == CONFIRMED: 3-way handshake over data ports

This confirms all four communication paths (announce and data in both
directions) before declaring a peer connected.

Changes:
- wire_protocol.hpp: Add CONNECT packet type (type=5) with SYN/ACK flags
- Discovery.hpp/cpp: Replace single ConnectionState enum with two-flag
  model (announce_heard bool + HandshakeState enum)
- NUClearNet.cpp: Force re-announce on new peer (multicast, not unicast),
  send CONNECT(SYN) to data port, gate DATA/ACK on is_connected()
- Routing: Add is_locally_subscribed() for receiver-side filtering
- NUClearNet send(): Unreliable broadcast sends go to multicast group
  instead of unicasting to each peer individually
- Tests: Update Discovery tests for new model, add test for late announce
  scenario (data handshake completes before announce heard)
- Docs: Rewrite connection establishment docs with sequence diagrams
  showing two-flag model, late announce, and multicast broadcast delivery
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants