feat: preserve MachineBootInterface information across machine/interface deletion#2448
feat: preserve MachineBootInterface information across machine/interface deletion#2448chet wants to merge 1 commit into
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
ff2834d to
c7c200c
Compare
|
@coderabbitai PTAL. Thank you! |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Fetch PR details and diff
gh pr view 2448 --repo NVIDIA/infra-controller --json title,body,files,additions,deletions,commits 2>/dev/null | jq '{title, additions, deletions, files: [.files[].path]}'Length of output: 2703 🏁 Script executed: #!/bin/bash
# Fetch the full diff
gh pr diff 2448 --repo NVIDIA/infra-controller 2>/dev/null | head -4000Length of output: 105614
1. Stale
|
A machine's boot target is the (MAC, EthernetInterface.Id) pair on its primary interface row -- also known as a `MachineBootInterface` -- and the ID half can be impossible to re-derive from a MAC address later; after a DPU flips to NIC mode, the BMC can report the interface ID without its MAC. So anything that deletes the `machine_interfaces` row (e.g. `force-delete --delete-interfaces`) can leave the re-ingested machine with no boot target until it happens to PXE on a leftover UEFI boot entry (if one exists). Soooo... this PR introduces a small `retained_boot_interfaces` table (keyed by MAC) to stash `MachineBootInterface` pairings in between ingestion(s). `machine_interface::delete` records every deleted pair there -- `force-delete`, `interface-delete`, any deletion path -- and the next `machine_interfaces` row for that MAC picks it back up at creation -- recovery lives in `create_with_type`, the one place every new row passes through -- however it's created: plain DHCP, a static `fixed_ip` preallocation, the proactive interface a DPU-mode ingest creates, or predicted-interface promotion (predicted rows gain a `boot_interface_id` column holding the live report's id -- refreshed every exploration -- for a host with unmanaged/no DPUs explored before its first DHCP). And to be clear, this applies to ANY interface -- a DPU in DPU mode, a DPU in NIC mode, an integrated NIC, etc. One more thing worth calling out (if it wasn't obvious): rows in `retained_boot_interfaces` are temporary. As soon as the `MachineBootInterface` is dropped into a `machine_interfaces` row, the record is removed (via `take_by_mac`). How long a record *stays* retained is configurable via a new top-level `retained_boot_interface_window` config; the default is forever -- if the machine eventually comes back, the pair will be waiting. The window is checked when the row is created (i.e. at DHCP time); retained IDs are deliberately never copied into predictions, so there's no copy sitting around that could dodge the window. Setting a window means a MAC reappearing on different hardware months later won't inherit an obsolete Redfish interface ID (e.g. the old `NIC.Slot.X` resource may not even exist there); a too-old record just gets swept on the spot, and the new row starts without a boot ID until exploration fills it in from a live report. Tests added to check `force-delete` boot interface retention, predicted-interface hand-off at promotion (and record removal), a DHCP-derived row recovering a retained ID (and removing the record), a statically preallocated row doing the same, and a pending prediction picking up the ID once a later report resolves it. The new `retained_boot_interface_window` config is documented in `cfg/README.md`. Existing zero-DPU, backfill, and force-delete tests still pass. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
c7c200c to
83f9a79
Compare
|
@coderabbitai Thanks for the eyes. Updated -- (1) added |
|
Implementation looks solid across all paths. Resolving comments now. 🐇 ✨ ✅ Action performedComments resolved. Approval is disabled; enable |
Description
A machine's boot target is the (MAC, EthernetInterface.Id) pair on its primary interface row -- also known as a
MachineBootInterface-- and the ID half can be impossible to re-derive from a MAC address later; after a DPU flips to NIC mode, the BMC can report the interface ID without its MAC.This means anything that deletes the
machine_interfaces(e.g.force-delete --delete-interfaces) can leave the re-ingested machine with no boot target until it happens to PXE on a leftover UEFI boot entry (if one exists).Soooo... this PR introduces a new
retained_boot_interfacestable as a place to stashMachineBootInterfacepairings in between ingestion(s), adjusting things such that theMachineBootInterfacedoesn't get lost.machine_interface::deleteitself records each deletedMachineBootInterfaceinto our smallretained_boot_interfacestable (keyed by MAC), so every deletion path retains them (whether it'sforce-delete,interface-deleteetc).And to connect even more,
predicted_machine_interfacesgains aboot_interface_idcolumn, so a host with unmanaged/no DPUs explored before its first DHCP also has a preparedMachineBootInterfacewhich can be picked up the moment it exists -- again, whether created by predicted-interface promotion, or directly by DHCP.To be clear, retention applies to ANY interface -- a DPU in DPU mode, a DPU in NIC mode, an integrated NIC, etc. Rows created any other way -- plain DHCP, or the proactive interface a DPU-mode ingest creates -- recover straight from
retained_boot_interfacesat creation; predicted interfaces are just the way hosts with unmanaged/zero DPUs are managed.One more thing worth calling out (if it wasn't obvious): rows in
retained_boot_interfacesare temporary. As soon as theMachineBootInterfaceis dropped into amachine_interfacesrow, the record is removed (viatake_by_mac). How long a record stays retained is configurable via a new top-levelretained_boot_interface_windowconfig; the default is forever -- if the machine eventually comes back, the pair will be waiting. Setting a window means a MAC reappearing on different hardware months later won't inherit an obsolete Redfish interface ID (e.g. the oldNIC.Slot.Xresource may not even exist there); a too-old record just gets swept on the spot, and the new row starts without a boot id -- exactly as if retention never existed -- until exploration fills it in from a live report.Tests added to check
force-deleteboot interface retention, and that a predicted interface hands its ID to the real row at promotion (and is removed from the retained interfaces), and that a DHCP-derived row is populated with a retained id and removes the retained record. Existing zero-DPU, backfill, and force-delete tests still pass.Signed-off-by: Chet Nichols III chetn@nvidia.com
Type of Change
Related Issues (Optional)
Breaking Changes
Testing
Additional Notes