Skip to content

docs: add known issue for management bridge MAC change after NetworkManager migration#1049

Open
tillo wants to merge 2 commits into
harvester:mainfrom
tillo:known-issue-mgmt-br-mac-float
Open

docs: add known issue for management bridge MAC change after NetworkManager migration#1049
tillo wants to merge 2 commits into
harvester:mainfrom
tillo:known-issue-mgmt-br-mac-float

Conversation

@tillo
Copy link
Copy Markdown

@tillo tillo commented May 22, 2026

Problem

After the wicked → NetworkManager migration (Harvester v1.6.x → v1.7.x), the management bridge mgmt-br is no longer protected against MAC address changes.

Under wicked, /etc/wicked/scripts/setup_bond.sh ran a bond post-up hook that pinned mgmt-br's MAC address to the bond MAC. That script still ships in /oem/90_custom.yaml, but it is dead code now that NetworkManager is the active network stack — wicked is inactive, so the hook never runs.

mgmt-br then behaves as a plain Linux bridge and adopts the lowest MAC address among its enslaved ports. Bridge-mode VM networks enslave VM veth interfaces into mgmt-br, so the bridge MAC can change on any VM start/stop/migrate.

This breaks Layer 2 load balancers: MetalLB's L2 ARP responder caches the announce-interface MAC at creation and keeps advertising the stale value. Once the originating veth is removed, that MAC exists on no interface and every affected LoadBalancer IP becomes unreachable — an intermittent, hard-to-diagnose outage where the backing pods are healthy, the IPs are reported as assigned, and ARP still resolves, but to a MAC that black-holes traffic.

Change

Adds Known Issue #6 to the v1.6.x → v1.7.x upgrade page: the failure mode plus a remediation — a NetworkManager dispatcher script that re-pins mgmt-br to the bond MAC on every network event, persisted via an /oem config file so it survives reboots and upgrades. This restores, under NetworkManager, the behaviour the old wicked setup_bond.sh hook provided.

Applied to docs/ and the version-v1.7 / version-v1.8 copies.

Notes

  • Observed and remediation verified on Harvester v1.7.1.
  • This may also warrant a product-side fix — Harvester could carry the NetworkManager-era equivalent of setup_bond.sh so no manual step is needed. Happy to file a separate harvester/harvester issue for that if maintainers prefer.

tillo and others added 2 commits May 22, 2026 11:04
…anager migration

The wicked-to-NetworkManager migration drops the bond post-up hook that
pinned the management bridge (mgmt-br) MAC address to the bond MAC. Without
it, mgmt-br behaves as an ordinary Linux bridge and floats its MAC to the
lowest-MAC enslaved port; bridge-mode VM veths can therefore change the
bridge MAC on VM lifecycle events. MetalLB's L2 ARP responder caches the
announce-interface MAC and then advertises a stale value, black-holing
LoadBalancer IPs.

Document the failure mode and a NetworkManager dispatcher-script remediation
persisted via an /oem config file, under Known Issues on the v1.6.x-to-v1.7.x
upgrade page (docs/ and the v1.7 and v1.8 versioned copies).

Signed-off-by: Martino Dell'Ambrogio <tillo@tillo.ch>
Part of the mdapi-wide leak-prevention sweep (2026-05-23):
- pre-commit hook (global) + CI `.pre` stage now run gitleaks on every push
- `.gitignore` baseline blocks .env/kubeconfig/SSH keys/PKCS12/.netrc

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant