Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions .github/workflows/vm-linux.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,21 @@ jobs:
key: fleetbox-linux-${{ hashFiles('**/go.sum') }}
restore-keys: fleetbox-linux-

# Hosted runners run Docker, which sets the iptables FORWARD policy to DROP when
# it enables IP forwarding. fleetbox VMs forward through their own non-docker
# bridge, so without this their egress is dropped — the same conflict libvirt KVM
# and LXD hit on Docker hosts. fleetbox does not override the host firewall, so
# allow the VM subnet range (192.168.0.0/16, where the backend hands out /24s)
# through DOCKER-USER, which Docker evaluates before its DROP. Both directions:
# guest egress (src) and the de-masqueraded return (dst). No-op without Docker.
# (Egress is asserted over TCP, not ICMP — the runner network drops outbound ICMP.)
- name: Allow VM egress past Docker's FORWARD DROP policy
run: |
if sudo iptables -L DOCKER-USER -n >/dev/null 2>&1; then
sudo iptables -I DOCKER-USER -s 192.168.0.0/16 -j ACCEPT
sudo iptables -I DOCKER-USER -d 192.168.0.0/16 -j ACCEPT
fi

- name: Boot real VMs (full capability-driven suite)
run: |
go test -c -o /tmp/fleetboxtest ./fleetboxtest
Expand Down
16 changes: 14 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ Prefer the CLI to the library? On macOS (Apple Silicon) it ships as a Homebrew c

```bash
brew tap pilat/fleetbox
brew trust --cask pilat/fleetbox/fleetbox
brew install --cask fleetbox
```

Expand Down Expand Up @@ -305,6 +306,13 @@ way, the decision log lives in [docs/adr/](docs/adr/).
brought back up needs its `/24` to still be free — on a contended host the auto-picked
subnet can shift and the rebooted VM won't be reachable; bring clusters up fresh.
arm64 Linux boot via rust-hypervisor-firmware is not yet validated on hardware.
- **Docker on the Linux host blocks VM egress.** When Docker is running it sets the
iptables `FORWARD` policy to `DROP`, which drops a fleetbox VM's traffic to the internet
(VMs forward through their own bridge, not Docker's) — the same conflict libvirt and LXD
hit on Docker hosts. VM↔VM and host↔VM still work; only internet egress is affected.
fleetbox deliberately does not rewrite your host firewall, so allow its subnet range
yourself: `sudo iptables -I DOCKER-USER -s 192.168.0.0/16 -j ACCEPT` (add the matching
`-d 192.168.0.0/16` rule for the return path).
- **v0 API.** Expect breaking changes until it stabilizes.

### CI
Expand All @@ -319,6 +327,9 @@ aren't re-downloaded every run):
- run: |
echo 'KERNEL=="kvm", GROUP="kvm", MODE="0666"' | sudo tee /etc/udev/rules.d/99-kvm.rules
sudo udevadm control --reload-rules && sudo udevadm trigger
# Hosted runners run Docker (FORWARD policy DROP); allow the VM subnet to forward.
sudo iptables -I DOCKER-USER -s 192.168.0.0/16 -j ACCEPT
sudo iptables -I DOCKER-USER -d 192.168.0.0/16 -j ACCEPT
- uses: actions/cache@v4
with:
path: |
Expand All @@ -328,8 +339,9 @@ aren't re-downloaded every run):
```

arm64 hosted Linux runners do **not** have KVM ("not supported for this sku"); use an
x86-64 runner for VM-boot CI. This is the "develop on a Mac, test in cheap x86-64 hosted
Linux CI" story.
x86-64 runner for VM-boot CI. Hosted runners also drop outbound **ICMP**, so check a
guest's internet egress over TCP (a connect to `1.1.1.1:443`), not `ping`. This is the
"develop on a Mac, test in cheap x86-64 hosted Linux CI" story.

## Roadmap

Expand Down
12 changes: 8 additions & 4 deletions docs/adr/0025-linux-netlink-nftables.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,8 @@ ceiling is a documented limitation, not a code path.
fail loudly unless the table, both chains, the masquerade verdict, and the forward
drop's match all survived.
- **Accepted egress ceiling.** On a host where Docker or ufw has clamped FORWARD to DROP,
the guests cannot reach the internet. Documented, not worked around.
the guests cannot reach the internet. Documented, not worked around (the README shows the
`DOCKER-USER` allow rules an operator can add if they want egress on such a host).
- **Irreducible uplink-transit residual.** Keeping the uplink's forwarding flag on permits
uplink-ingress transit to the host's other routed networks — the inherent cost of routed
egress without a global clamp. Documented, not chased.
Expand All @@ -136,6 +137,9 @@ ceiling is a documented limitation, not a code path.
(its `MASQUERADE` rule and global `ip_forward` flip are invisible to the new sweep).
fleetbox is pre-release: delete `~/.fleetbox` by hand when upgrading across this change.
- **Dogfood-proven, not just compiled.** For network code, compile and lint prove nothing.
The VM-boot CI (`vm-linux.yml`) now asserts internet egress over SSH from the booted
guest (`ping 1.1.1.1`), so a missing or silently-dropped masquerade rule fails CI rather
than passing a green build.
The VM-boot CI (`vm-linux.yml`) asserts real internet egress over SSH from the booted
guest — a TCP connect to `1.1.1.1:443`, **not** `ping`, because GitHub-hosted runners drop
outbound ICMP (the host itself cannot ping out). The runner is itself a Docker host
(FORWARD policy DROP), so the workflow first opens the VM subnet through `DOCKER-USER` —
the operator's job, per the egress ceiling above; fleetbox does not. A missing or
silently-dropped masquerade rule then fails CI rather than passing a green build.
20 changes: 10 additions & 10 deletions fleetboxtest/conformance_vm_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -55,16 +55,16 @@ func TestVMConformance(t *testing.T) {
t.Errorf("SSH output = %q, want it to contain conformance-ok", out)
}

// Egress: the guest must reach the public internet. On Linux this drives the nft
// masquerade + per-interface forwarding end to end (ADR-0025) — a missing or
// silently-dropped masq rule fails here, which a plain echo-over-SSH would never
// catch. Ping by IP so the check does not depend on guest DNS.
out, err = vm.SSH(ctx, "ping -c1 -W5 1.1.1.1")
if err != nil {
t.Fatalf("egress ping failed (guest cannot reach the internet): %v\n%s", err, out)
}
if !strings.Contains(out, "0% packet loss") {
t.Fatalf("egress ping got no reply (no internet egress):\n%s", out)
// Egress: the guest must reach the public internet. This drives the nft masquerade
// + per-interface forwarding end to end (ADR-0025) — a missing or silently-dropped
// masq rule fails here, which a plain echo-over-SSH would never catch. Use a TCP
// connect, NOT ICMP ping: some CI networks (notably GitHub-hosted runners) drop
// outbound ICMP while allowing TCP, so a ping would be a false negative even when
// egress works. 1.1.1.1:443 is a stable, DNS-independent target; bash's /dev/tcp
// needs no extra package in the stock guest.
out, err = vm.SSH(ctx, "timeout 8 bash -c 'exec 3<>/dev/tcp/1.1.1.1/443 && echo egress-ok'")
if err != nil || !strings.Contains(out, "egress-ok") {
t.Fatalf("egress failed (guest cannot open TCP to the internet): %v\n%s", err, out)
}

// Stop gracefully (disk preserved), then Destroy removes everything — the full
Expand Down