feat: Mirror Core expected inventory into Flow tables#2440
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
🔐 TruffleHog Secret Scan✅ No secrets or credentials found! Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉 🕐 Last updated: 2026-06-11 00:18:56 UTC | Commit: 89473f2 |
89473f2 to
f6a5afb
Compare
Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
Nullable + partial unique index so legacy ingestion rows coexist until the mirror adopts them. Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
🔍 Container Scan Summary
Per-CVE detail lives in the per-service |
Runs at the top of each inventory cycle. Adopts legacy rows by (manufacturer, serial), soft-deletes ones Core no longer reports, resurrects soft-deleted rows when Core re-reports them. Mirrors racks whose Core rack_id is unset (external_id stays NULL) and GCs stale soft-deleted tombstones occupying a name that Core is reusing so the INSERT/UPDATE doesn't collide on the full unique rack_name_idx. RPC errors and empty Core responses skip the delete phase. Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
Match by (manufacturer, serial_number) — same unique key Flow's ingestion path already enforces — and reconcile the host BMC alongside. Non-host BMC rows (DPU, etc.) are left strictly alone: Core's ExpectedMachine doesn't describe them, so touching them from here would hard-delete the DPU BMC whenever the bun preload order put it at BMCs[0]. RackID is resolved per cycle from rack.external_id so a machine's RackID="a12" lands on the matching Flow rack UUID. Soft-delete + resurrect on transient absence, same shape as the rack mirror. firmware_version and BMC credentials are not mirrored: the former is owned by the runtime sync's BMC inventory read, the latter live in Vault after site-explorer's password rotation. Each insert / update (with field-by-field deltas) / delete is logged at INFO so the operator can audit what Core drove this cycle. Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
…ync_* files Move the rack-specific code out of expected_mirror.go into expected_mirror_rack.go. Move each per-type actual sync (machine / switch / powershelf) and its tests out of inventory.go into actual_sync_*.go; cross-type helpers land in actual_sync.go. inventory.go shrinks to the runInventoryOne orchestrator. Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
- Resolve-rack-id failure now mirrors the component with NULL rack_id instead of soft-deleting it. Core is still reporting the component; losing its UUID because the rack mirror dropped one cycle ahead is wrong. - BMC insert paths first hard-delete any orphan owning the same MAC. The bmc table's PK is the MAC alone so a chassis-to-chassis BMC transplant without orphan cleanup would otherwise roll back the whole type's mirror tx. - planBMCReconciliation now finds every type='Host' BMC and hard-deletes the extras. Core's authoritative view is exactly one host BMC per component; the mirror enforces it. - parseLabelInt distinguishes "Core omitted the label" (0, ok=true) from "Core sent a non-integer" (0, ok=false). Malformed labels mark the field for preservation so UPDATE keeps Flow's existing value instead of clobbering with 0; INSERT still falls back to 0 with a warn since there's nothing to preserve. - Duplicate (manufacturer, serial) specs from Core get a warn so the Cloud REST bug is visible; last write wins. - Component and rack mirror reset success-side counters on tx rollback so the summary log reflects what actually landed. Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
f6a5afb to
f03a6e4
Compare
|
🌿 Preview your docs: https://nvidia-preview-pull-request-2440.docs.buildwithfern.com/infra-controller |
…TORY_SYNC_ENABLED Default is off so deploying the binary doesn't auto-activate writes to rack / component from the new mirror path. Operators flip the env to a strconv.ParseBool-truthy value per cluster to opt in. The gate is read once at job construction; runInventoryOne skips syncExpectedFromCore entirely when off. Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
f03a6e4 to
9a79c7f
Compare
Description
rack,componentandbmctables on every inventory cycle, before drift detection runs. Gated byFLOW_EXPECTED_INVENTORY_SYNC_ENABLED, default off — deploying this PR is a no-op until the env is flipped per-cluster. Theinventorysyncpackage is split intoexpected_mirror_*+actual_sync_*files (separate refactor commit) so the new code is easy to navigate.rack_id(newrack.external_idcolumn + migration); component by(manufacturer, serial_number). Soft-deleted rows resurrect on Core re-report so UUIDs stay stable; rack-name tombstones are GC'd in-tx for index-collision safety.type='Host'rows — DPU BMCs are left to the ingestion path. Extra host BMCs are hard-deleted to enforce Core's "exactly one host" view; MAC-PK conflicts evict the orphan in-tx so the cycle doesn't abort.Type of Change
Breaking Changes
Testing