Skip to content

refactor(linux): netlink/nftables host networking and a unified capability-driven VM test suite#12

Merged
pilat merged 1 commit into
mainfrom
refactor/linux-netlink-nftables
Jun 13, 2026
Merged

refactor(linux): netlink/nftables host networking and a unified capability-driven VM test suite#12
pilat merged 1 commit into
mainfrom
refactor/linux-netlink-nftables

Conversation

@pilat

@pilat pilat commented Jun 13, 2026

Copy link
Copy Markdown
Owner

This branch pulls together two related pieces of the Linux story.

The first is a rewrite of how the Linux backend programs host networking. It used to build the shared bridge, the per-VM taps, and the egress masquerade by shelling out to ip and iptables — fragile on exactly the hosts fleetbox is meant to run on (a laptop, a CI runner, a random KVM box), where those binaries may be missing or sitting in a /sbin the user's PATH doesn't include. It now talks to the kernel directly over netlink and nftables from pure Go, so the Linux path depends on no host networking tools at all. While we were in there we also fixed the security posture: instead of flipping the global ip_forward switch (which quietly turns the whole machine into a router across every interface) it enables forwarding only on the bridge and the discovered uplink, and it owns a single nftables table per bridge that both masquerades egress and keeps unsolicited inbound out of the guest subnet. The reasoning is written up in ADR-0025, which amends ADR-0011 and ADR-0013.

The second piece is a cleanup of the test suite, and it's why the two changes ride together: the cleanest way to prove the new networking code actually works on a live kernel is to run the real tests against it. Until now the VM tests were reachable only through a tangle of make targets, -run filters, and build tags — a human had to pick which tests ran where, and the nested "fleetbox-tests-fleetbox" dogfood ran a bespoke copy of the networking checks rather than the real ones. This collapses all of that into one capability-driven surface: the same test code runs everywhere, and each test decides for itself whether to boot a VM based on the speed tier (-short) and a runtime probe of the host — is /dev/kvm there, does this Mac do nested virt, does the backend support clustering. No -run selectors, no build tags used to pick tests. There are now two obvious entry points, make test for the quick VM-free tier and make test-vm for the full run, instead of four confusing ones.

The payoff is concrete. The amd64 KVM CI job now boots a real two-node cluster, so it exercises VM↔VM connectivity over the new nftables/bridge plane for the first time. And the nested dogfood cross-builds the real suite and runs it inside an outer guest on the cloud-hypervisor backend, so the netlink/nftables path gets hit by the actual conformance (egress through the nft masquerade) and cluster (VM↔VM plus subnet isolation) tests instead of a hand-written stand-in — exercising the arm64 direct-kernel path locally on the M4 that hosted CI can't reach.

Checklist

  • Changed the public API, package list, CLI surface, on-disk layout, or dependencies → ARCHITECTURE.md updated in this PR
  • Made a new, hard-to-reverse design decision → added an ADR under docs/adr/ (next sequential number)
  • Breaking change (! in the title) → the description spells out what callers must change

Summary by CodeRabbit

Release Notes

  • New Features

    • Extended VM test suite with full capability-driven testing across platforms; added environment-based boot timeout configuration.
    • Added egress connectivity verification in VM conformance tests.
  • Bug Fixes

    • Improved KVM nested virtualization detection to support broader architectures beyond x86.
    • Enhanced host capability checks for VM boot support across macOS and Linux platforms.
  • Documentation

    • Updated architecture and networking documentation reflecting direct kernel interactions for guest networking.
    • Clarified CI/testing behavior and dependencies.
  • Chores

    • Simplified build system by consolidating nested VM testing into unified test targets.
    • Added required networking dependencies.

@coderabbitai

coderabbitai Bot commented Jun 13, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@pilat, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 35 minutes and 21 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 63f891a8-f0b7-4d4f-80de-5e15a1c8c26a

📥 Commits

Reviewing files that changed from the base of the PR and between 8e3778a and 5b3ed89.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (25)
  • .github/workflows/vm-linux.yml
  • ARCHITECTURE.md
  • CLAUDE.md
  • Makefile
  • README.md
  • docs/adr/0011-linux-cloud-hypervisor-backend.md
  • docs/adr/0013-crash-safe-linux-network-lifecycle.md
  • docs/adr/0025-linux-netlink-nftables.md
  • fleetbox_linux.go
  • fleetboxtest/cluster_test.go
  • fleetboxtest/conformance_vm_test.go
  • fleetboxtest/fixtures_vm_test.go
  • fleetboxtest/fleetboxtest.go
  • fleetboxtest/fleetboxtest_test.go
  • fleetboxtest/nested_test.go
  • go.mod
  • internal/backend/cloudhypervisor/cloudhypervisor.go
  • internal/backend/cloudhypervisor/forwarding.go
  • internal/backend/cloudhypervisor/netstate.go
  • internal/backend/cloudhypervisor/network.go
  • internal/backend/cloudhypervisor/nftables.go
  • internal/backend/cloudhypervisor/purehelpers.go
  • internal/backend/cloudhypervisor/purehelpers_test.go
  • internal/holder/holder.go
  • internal/orchestrator/orchestrator.go
📝 Walkthrough

Walkthrough

This PR migrates Linux cloud-hypervisor networking from subprocess-based ip/iptables calls to pure-Go netlink and nftables kernel APIs, with per-interface forwarding instead of global toggling. Test infrastructure shifts from build-tag-based gating to capability-driven runtime skipping. A new ADR-0025 documents the design decision, consequences, and crash-safe write-ahead recovery.

Changes

Linux Networking Migration

Layer / File(s) Summary
Architecture & ADR documentation
ARCHITECTURE.md, docs/adr/0011-*, docs/adr/0013-*, docs/adr/0025-*, README.md, CLAUDE.md
Design and consequence documentation updated to describe netlink/nftables approach, per-interface forwarding semantics, crash-safe lifecycle, and new capability-driven test gating. New ADR-0025 details the decision, alternatives (iptables-compat chains, global forwarding revert), and consequences (no host tool dependencies, improved security, specific error expectations, read-back verification requirements).
Dependencies and kernel parameter probing
go.mod, fleetbox_linux.go, internal/backend/cloudhypervisor/cloudhypervisor.go
Added direct dependencies on github.com/google/nftables and github.com/vishvananda/netlink; expanded KVM nested-virtualization probing to include base kvm module parameter alongside x86-specific modules.
Pure helper functions
internal/backend/cloudhypervisor/purehelpers.go, internal/backend/cloudhypervisor/purehelpers_test.go
Added portable helpers: nftTableName (bridge-to-nft-table mapping), classifyNFTErr (error normalization: EPERM vs nf_tables lack), uplinkName (default-route interface resolution); comprehensive unit tests validate naming, error semantics, and index selection.
Persisted network state
internal/backend/cloudhypervisor/netstate.go
Reshaped netRecord from global ip_forward/masquerade fields to per-uplink model (Uplink, UplinkFwdOrig); replaced single ipforward.orig marker with per-uplink markers using first-writer-wins semantics; generalized sysctl helpers to per-interface forwarding controls.
Forwarding lifecycle
internal/backend/cloudhypervisor/forwarding.go
Implements per-interface IPv4 forwarding: discovers uplink via netlink route probing, records original forwarding state (first-writer-wins via O_EXCL), conditionally enables forwarding only when not globally forwarding, provides restoration callback that idempotently clears markers when no fleetbox networks remain.
nftables firewall
internal/backend/cloudhypervisor/nftables.go
Per-bridge IPv4 nftables tables with NAT masquerade (subnet-scoped, excluding bridge) and self-protecting forward-drop filter (blocks new connections to guests from non-bridge interfaces); includes probe for nf_tables kernel support, read-back verification to detect expression unsupport, and idempotent teardown.
Network orchestration
internal/backend/cloudhypervisor/network.go
Refactored from subprocess ip/iptables to netlink/nftables: CreateNetwork probes CAP_NET_ADMIN, creates bridge via netlink, probes nf_tables, enables forwarding, installs firewall; Close removes firewall table then bridge then restores forwarding; Reconcile idempotently deletes orphaned taps/bridges/firewall and restores forwarding; tap lifecycle uses netlink Tuntap with rollback closure on failure.
Integration documentation
internal/holder/holder.go, internal/orchestrator/orchestrator.go
Updated comments to describe netlink/nftables cleanup scope and syscall reasons for CreateNetwork slowness.

Capability-Driven Test Infrastructure

Layer / File(s) Summary
CI workflow and build targets
.github/workflows/vm-linux.yml, Makefile
Updated GitHub workflow to run full capability-driven test suite (removed -test.run TestVMConformance filter) with 30m timeout; Makefile's test-vm target now runs full suite with 90m timeout; removed separate test-nested target; documented two-tier approach (VM-free vs VM-capable).
Test fixture helper functions
fleetboxtest/fleetboxtest.go
New exported helpers: SkipIfCannotBootVM (platform-specific gating: macOS arm64 via nested-virt, Linux amd64/arm64 via /dev/kvm availability), BootTimeout (configurable via FLEETBOX_IP_WAIT_TIMEOUT env, defaults to 5min per VM), SkipIfShort; Start/StartN use boot timeout and convert ErrClustersUnsupported to skip.
Test fixture helper tests
fleetboxtest/fleetboxtest_test.go
Unit test TestBootTimeout validates timeout derivation from FLEETBOX_IP_WAIT_TIMEOUT across unset, valid, unparseable, and non-positive scenarios.
Test suite updates
fleetboxtest/cluster_test.go, fleetboxtest/conformance_vm_test.go, fleetboxtest/fixtures_vm_test.go, fleetboxtest/nested_test.go
Removed platform build tags; tests now use runtime capability skipping. Cluster tests add explicit CPU/memory sizing; conformance test adds egress verification (ping to 1.1.1.1); nested test switched from build-tag to runtime gating (darwin/arm64 + nested-virt supported + FLEETBOX_HELPER set) and now cross-compiles test binary for nested execution with adjusted timeouts.
Developer guide
CLAUDE.md
Updated to document new two-tier testing (make test for VM-free/fast, make test-vm for full suite with self-skipping), capability-driven fixture helpers, and updated CI/local dev expectations for macOS/Linux.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • pilat/fleetbox#10: Updates fleetboxtest/nested_test.go to refactor the nested dogfood test from build-tag gating to runtime gating and changes the nested execution flow; directly related to this PR's nested test runtime-gating overhaul.

Poem

🐰 From shell scripts to kernel calls so pure,
Netlink and nftables now ensure,
Per-interface forwarding, no global flip,
Tests skip by capability, not by trip.
A crash-safe dance, write-ahead and restore—
Linux networking flows like never before! 🌊

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main changes: a refactor of Linux networking via netlink/nftables and unification of the VM test suite into a capability-driven approach.
Description check ✅ Passed The PR description comprehensively explains the motivation, design decisions, and rationale for both major changes. It includes references to ADRs and updated documentation, and the author completed all checklist items.
Docstring Coverage ✅ Passed Docstring coverage is 82.50% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch refactor/linux-netlink-nftables

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@pilat pilat self-assigned this Jun 13, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@fleetboxtest/fleetboxtest_test.go`:
- Line 42: Add a doc comment immediately above the TestBootTimeout function
declaration that begins with "TestBootTimeout" and briefly describes the test's
purpose and behavior (e.g., what it verifies and any important conditions or
expectations); ensure the comment follows Go doc comment style for exported
symbols and is placed directly above the TestBootTimeout func declaration so
linters and readers can pick it up.

In `@internal/backend/cloudhypervisor/forwarding.go`:
- Around line 116-119: The loop in maybeRestoreForwarding currently clears the
fwd-*.orig marker unconditionally; change it so writeForwarding(uplink, orig)
returns/propagates an error and only call store.clearForwardingOrig(uplink) when
writeForwarding succeeded (err == nil). Specifically, in maybeRestoreForwarding,
capture the error from writeForwarding(uplink, orig), log or return the error as
appropriate, and only call store.clearForwardingOrig(uplink) on success; leave
the marker in place on failure so retries can occur.
- Around line 26-31: discoverUplink currently swallows netlink.RouteGet errors
and treats any failure as “no uplink”; update the code that calls
netlink.RouteGet in discoverUplink to check and propagate/log the error instead
of ignoring it (i.e., inspect the error returned from netlink.RouteGet, return
or surface it to the caller so hard failures aren’t misclassified as offline,
and only proceed to build indices when routes is non-nil and err == nil). In
maybeRestoreForwarding, stop clearing the first-writer-wins marker
unconditionally: call writeForwarding and check its returned error, and only
call store.clearForwardingOrig(...) after writeForwarding succeeds; if
writeForwarding fails, preserve the marker and return the error (or log and
retry as appropriate). Reference functions: discoverUplink,
maybeRestoreForwarding, writeForwarding, and store.clearForwardingOrig; ensure
netlink.RouteGet error handling is explicit and restore-marker clearing is
conditional on successful restore.

In `@internal/backend/cloudhypervisor/network.go`:
- Around line 99-105: When nftProbe() fails after the network object is created,
ensure the firewall cleanup is attempted and the WAL network record is removed:
call removeFirewall() before n.Close(), and after attempting removeFirewall()
set the n.fwRemoved flag (or otherwise ensure Close() will remove the WAL
record) so that Close() does not leave a stale record even if removeFirewall()
returns an error; keep error logging from removeFirewall() but proceed to call
n.Close() to delete the WAL entry.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e2eb660e-fbd1-4446-939e-9a9e881a78e6

📥 Commits

Reviewing files that changed from the base of the PR and between 9243508 and 8e3778a.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (25)
  • .github/workflows/vm-linux.yml
  • ARCHITECTURE.md
  • CLAUDE.md
  • Makefile
  • README.md
  • docs/adr/0011-linux-cloud-hypervisor-backend.md
  • docs/adr/0013-crash-safe-linux-network-lifecycle.md
  • docs/adr/0025-linux-netlink-nftables.md
  • fleetbox_linux.go
  • fleetboxtest/cluster_test.go
  • fleetboxtest/conformance_vm_test.go
  • fleetboxtest/fixtures_vm_test.go
  • fleetboxtest/fleetboxtest.go
  • fleetboxtest/fleetboxtest_test.go
  • fleetboxtest/nested_test.go
  • go.mod
  • internal/backend/cloudhypervisor/cloudhypervisor.go
  • internal/backend/cloudhypervisor/forwarding.go
  • internal/backend/cloudhypervisor/netstate.go
  • internal/backend/cloudhypervisor/network.go
  • internal/backend/cloudhypervisor/nftables.go
  • internal/backend/cloudhypervisor/purehelpers.go
  • internal/backend/cloudhypervisor/purehelpers_test.go
  • internal/holder/holder.go
  • internal/orchestrator/orchestrator.go

Comment thread fleetboxtest/fleetboxtest_test.go
Comment thread internal/backend/cloudhypervisor/forwarding.go
Comment thread internal/backend/cloudhypervisor/forwarding.go
Comment thread internal/backend/cloudhypervisor/network.go
@pilat pilat force-pushed the refactor/linux-netlink-nftables branch from 8e3778a to 5b3ed89 Compare June 13, 2026 11:08
@pilat pilat merged commit 6dfaa98 into main Jun 13, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant