Skip to content

RFC: collapse NetBox-inspired kinds into a single property-graph model (typed nodes + edges with attributes) #43

Description

@ecv

Summary

Collapse the inventory's NetBox-inspired kind hierarchy into a single property-graph model: one node resource carrying a type attribute, plus a separate edge resource for relationships. Both nodes and edges hold arbitrary key/value attributes. The current concrete kinds (Region, Site, Cluster, NetworkDevice, Rack, Provider, Circuit, Cable, Port, VirtualMachine, the compute Node, Fleet) become values of the node type attribute rather than distinct CRDs.

Motivation

The model to date follows NetBox (NetBox Labs). NetBox predates the cloud-native graph era and bakes a fixed containment hierarchy (Region → Site → Rack → Device → …) into its schema. That rigidity shows up here as:

  • A new asset class = a new CRD + the full 10-step checklist (types, codegen, controller, webhook, indexers, main + suite wiring, CRD/webhook kustomize, samples, IAM ProtectedResource + Roles — see CLAUDE.md "Adding a new kind"). High friction for what is fundamentally "another box that relates to other boxes".
  • Hard-coded relationship topology. Parent/child is expressed as typed refs (SiteRef, RegionRef, ClusterRef, AssetReference, PortDeviceReference) with CEL Enum-restricted kinds. Every new relationship shape needs schema + webhook + indexer changes. Link and Cable and Circuit are already three different bespoke edge-like kinds.
  • Awkward fits. Link/Cable/Circuit are edges modeled as nodes; Port is a sub-part of a device; Fleet/Cluster are groupings. A graph expresses all of these uniformly.

A property graph (typed nodes + typed edges, both with attribute bags) is the natural data model for an asset registry whose whole job is "record what exists and how it relates".

Proposed model

Two cluster-scoped kinds replace the ~13 current ones:

  1. Graph node — the vertex. Carries:
    • type (string, required) — the asset class. Allowed values are the current NetBox-inspired kinds: Region, Site, Cluster, NetworkDevice, Rack, Provider, Circuit, Cable, Port, VirtualMachine, Host (the compute node — see "Naming collision" below), Fleet, … and trivially extensible.
    • attributes — key/value pairs (the former per-kind spec fields: displayName, cpuCores, siteType, coordinates, etc.).
  2. Graph edge — the relationship. Carries:
    • type (string, required) — e.g. located-in, member-of, mounted-in, connects, realized-by, provided-by.
    • endpoints (two graph-node refs; or from/to).
    • attributes — key/value pairs (the former Link capacity/latency, Placement start-unit/face, etc.).

Existing typed refs become edges: Site.regionRef(Site)-[located-in]->(Region); Host.assignment.clusterRef(Host)-[member-of]->(Cluster); Placement(Device)-[mounted-in {startUnit,unitHeight,face}]->(Rack); Link/Cable/Circuit → edges directly.

Naming collision — resolved

There is already a Node kind — the compute node (physical/virtual machine). The graph vertex cannot also be called Node in the API. Resolution: the graph-vertex kind takes the name Node; the compute node becomes the type value Host. The old compute Node spec (hardware, addresses, cluster assignment, placement) carries forward as the attribute bag of a node with type: Host. All references to the compute node throughout this issue use Host accordingly.

Open questions / tradeoffs (the hard part)

Moving from many typed CRDs to two generic ones trades schema rigor for flexibility. Must decide how to recover what we lose:

  • Per-type schema validation. Today the CRD OpenAPI schema + CEL enforce shape (required fields, enums, lat/long ranges, immutability, distinct link endpoints). With a generic attributes bag, that moves into an admission webhook driven by a per-type schema registry (or a NodeType/EdgeType CRD describing allowed/required attribute keys + types). What's the registry's source of truth?
  • Relationship integrity. Delete guards (Region/Site/Cluster reject DELETE while referenced) and create-time ref checks (NetworkDevice, Link) currently use per-kind field indexers. In a graph these become generic edge-existence queries — one indexer over edge endpoints instead of N per-kind indexers. Simpler, but the webhook logic must be type-aware.
  • IAM granularity. Milo IAM grants inventory.miloapis.com/<plural>.<verb> per kind (see Inventory unwritable: no IAM (ProtectedResource/Role/PolicyBinding) for inventory.miloapis.com #39). Two kinds means two ProtectedResources — IAM can no longer distinguish "edit Sites" from "edit Racks". Do we need attribute/type-scoped authorization? If so, how (label selectors, type in the resource name, OPA-style policy)?
  • Discoverability / UX. Print columns, kubectl get site, and the CRD API reference (crdoc, Generate API reference from CRDs (crdoc) + fix scope #25) are per-kind today. A generic kubectl get entity loses the typed columns. Mitigation: type-aware additionalPrinterColumns are impossible on one CRD; may need a plugin (datumctl-inventory, Release wiring for datumctl-inventory plugin (goreleaser) #42) or per-type CRD views.
  • Topology labels. topology.inventory.miloapis.com/{region,site,cluster,rack} propagation (common_types.go) is computed by walking typed refs. In a graph this becomes transitive-closure over located-in/member-of edges — needs a generic propagation reconciler.
  • Conditions / status. Ready + per-kind reasons stay, but reasons become generic (ReferentNotFound) or type-parameterized.
  • Migration. Existing CRs (samples, anything deployed on Milo) must convert. One-shot migration tool, or run both models during a deprecation window? CRDs live on Milo via a separate Flux Kustomization — sequencing matters.

Scope of this issue

This is a design/RFC issue, not an implementation task. Deliverable: a written proposal (under docs/ or the enhancements repo, cf. #15) that:

  1. Confirms kind names (vertex = Node, compute → type: Host; pick the edge kind name, e.g. Edge/Relationship/Link).
  2. Defines the attribute-schema/validation mechanism.
  3. Maps every current kind + ref to its graph-node type / edge type.
  4. Specifies how delete-guards, IAM, topology labels, and print/UX are recovered.
  5. Lays out a migration path for already-deployed CRs.

Implementation, if accepted, follows as milestone(s).

References

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions