Skip to content

feat: Mirror Core expected inventory into Flow tables#2440

Open
kunzhao-nv wants to merge 7 commits into
NVIDIA:mainfrom
kunzhao-nv:feat/flow-expected-inventory-sync
Open

feat: Mirror Core expected inventory into Flow tables#2440
kunzhao-nv wants to merge 7 commits into
NVIDIA:mainfrom
kunzhao-nv:feat/flow-expected-inventory-sync

Conversation

@kunzhao-nv

@kunzhao-nv kunzhao-nv commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Description

  • Mirrors Core's expected racks / machines / switches / power-shelves into Flow's rack, component and bmc tables on every inventory cycle, before drift detection runs. Gated by FLOW_EXPECTED_INVENTORY_SYNC_ENABLED, default off — deploying this PR is a no-op until the env is flipped per-cluster. The inventorysync package is split into expected_mirror_* + actual_sync_* files (separate refactor commit) so the new code is easy to navigate.
  • Match: rack by Core's rack_id (new rack.external_id column + migration); component by (manufacturer, serial_number). Soft-deleted rows resurrect on Core re-report so UUIDs stay stable; rack-name tombstones are GC'd in-tx for index-collision safety.
  • BMC reconciliation touches only type='Host' rows — DPU BMCs are left to the ingestion path. Extra host BMCs are hard-deleted to enforce Core's "exactly one host" view; MAC-PK conflicts evict the orphan in-tx so the cycle doesn't abort.
  • Per-type independent and fault-tolerant: an RPC error or empty Core response skips that type's delete phase so transient blips can't wipe Flow. Every insert / update (with field-by-field deltas) / soft-delete / eviction is logged at INFO for operator audit.

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

@kunzhao-nv kunzhao-nv requested a review from a team as a code owner June 11, 2026 00:16
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e0b99482-5845-4dee-a942-5a21ec06dc78

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

Copy link
Copy Markdown

🔐 TruffleHog Secret Scan

No secrets or credentials found!

Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉

🔗 View scan details

🕐 Last updated: 2026-06-11 00:18:56 UTC | Commit: 89473f2

@kunzhao-nv kunzhao-nv force-pushed the feat/flow-expected-inventory-sync branch from 89473f2 to f6a5afb Compare June 11, 2026 00:19
Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
Nullable + partial unique index so legacy ingestion rows coexist until
the mirror adopts them.

Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
@kunzhao-nv kunzhao-nv changed the title feat: Mirror Core expected inventory into the Flow tables feat: Mirror Core expected inventory into Flow tables Jun 11, 2026
@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown

🔍 Container Scan Summary

Service Total Critical High Medium Low Other
nico-flow 126 14 54 44 5 9
nico-nsm 133 11 45 66 11 0
nico-psm 128 14 56 44 5 9
nico-rest-api 192 17 88 70 8 9
nico-rest-cert-manager 105 6 51 35 4 9
nico-rest-db 126 14 54 44 5 9
nico-rest-site-agent 125 14 54 44 4 9
nico-rest-site-manager 112 7 52 40 4 9
nico-rest-workflow 128 14 56 44 5 9
TOTAL 1175 111 510 431 51 72

Per-CVE detail lives in the per-service grype-* artifacts (JSON + SARIF). Severity counts only — no CVE IDs published here.

Runs at the top of each inventory cycle. Adopts legacy rows by
(manufacturer, serial), soft-deletes ones Core no longer reports,
resurrects soft-deleted rows when Core re-reports them. Mirrors racks
whose Core rack_id is unset (external_id stays NULL) and GCs stale
soft-deleted tombstones occupying a name that Core is reusing so the
INSERT/UPDATE doesn't collide on the full unique rack_name_idx. RPC
errors and empty Core responses skip the delete phase.

Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
Match by (manufacturer, serial_number) — same unique key Flow's
ingestion path already enforces — and reconcile the host BMC alongside.
Non-host BMC rows (DPU, etc.) are left strictly alone: Core's
ExpectedMachine doesn't describe them, so touching them from here would
hard-delete the DPU BMC whenever the bun preload order put it at
BMCs[0]. RackID is resolved per cycle from rack.external_id so a
machine's RackID="a12" lands on the matching Flow rack UUID.
Soft-delete + resurrect on transient absence, same shape as the rack
mirror. firmware_version and BMC credentials are not mirrored: the
former is owned by the runtime sync's BMC inventory read, the latter
live in Vault after site-explorer's password rotation. Each insert /
update (with field-by-field deltas) / delete is logged at INFO so the
operator can audit what Core drove this cycle.

Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
…ync_* files

Move the rack-specific code out of expected_mirror.go into
expected_mirror_rack.go. Move each per-type actual sync (machine / switch /
powershelf) and its tests out of inventory.go into actual_sync_*.go;
cross-type helpers land in actual_sync.go. inventory.go shrinks to the
runInventoryOne orchestrator.

Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
- Resolve-rack-id failure now mirrors the component with NULL rack_id
  instead of soft-deleting it. Core is still reporting the component;
  losing its UUID because the rack mirror dropped one cycle ahead is
  wrong.
- BMC insert paths first hard-delete any orphan owning the same MAC.
  The bmc table's PK is the MAC alone so a chassis-to-chassis BMC
  transplant without orphan cleanup would otherwise roll back the
  whole type's mirror tx.
- planBMCReconciliation now finds every type='Host' BMC and hard-deletes
  the extras. Core's authoritative view is exactly one host BMC per
  component; the mirror enforces it.
- parseLabelInt distinguishes "Core omitted the label" (0, ok=true)
  from "Core sent a non-integer" (0, ok=false). Malformed labels mark
  the field for preservation so UPDATE keeps Flow's existing value
  instead of clobbering with 0; INSERT still falls back to 0 with a
  warn since there's nothing to preserve.
- Duplicate (manufacturer, serial) specs from Core get a warn so the
  Cloud REST bug is visible; last write wins.
- Component and rack mirror reset success-side counters on tx
  rollback so the summary log reflects what actually landed.

Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
@kunzhao-nv kunzhao-nv force-pushed the feat/flow-expected-inventory-sync branch from f6a5afb to f03a6e4 Compare June 11, 2026 05:11
@github-actions

Copy link
Copy Markdown

@kunzhao-nv kunzhao-nv requested a review from jw-nvidia June 11, 2026 05:13
…TORY_SYNC_ENABLED

Default is off so deploying the binary doesn't auto-activate writes to
rack / component from the new mirror path. Operators flip the env to a
strconv.ParseBool-truthy value per cluster to opt in. The gate is read
once at job construction; runInventoryOne skips syncExpectedFromCore
entirely when off.

Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
@kunzhao-nv kunzhao-nv force-pushed the feat/flow-expected-inventory-sync branch from f03a6e4 to 9a79c7f Compare June 11, 2026 05:23
@kunzhao-nv kunzhao-nv requested a review from chet June 11, 2026 05:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant