Skip to content

Implementation plan: Physical inventory for Milo (enhancements#713) #15

Description

@ecv

Implementation plan for datum-cloud/enhancements#713 — Physical inventory for Milo, built atop the existing inventory.miloapis.com service in this repo.

This is a sketch to anchor discussion and to spawn the per-milestone implementation issues hung off the enhancement. It is intentionally incremental: each milestone is independently shippable and keeps the prototype running.

Where we are today

The service already ships these cluster-scoped CRDs in inventory.miloapis.com/v1alpha1, each with a controller, a validating webhook, generated CRDs/RBAC under config/, and SetupIndexers wiring:

Kind Models Key refs
Region geographic grouping
Site facility / AZ / edge / virtual location regionRef
Cluster a Kubernetes cluster footprint controlPlaneSiteRef
Node physical/virtual host, hardware, addresses siteRef, assignment→cluster
NetworkDevice router/switch/firewall clusterRef, siteRef
Link logical/physical/internet link between two assets endpoints[2] (AssetReference)

Shared building blocks already exist and should be reused, not reinvented:

  • LocalObjectReference{Name} and AssetReference{Kind,Name} for references.
  • Coordinates, and the topology.inventory.miloapis.com/* label convention for query/rollup.
  • Per-kind validating webhooks enforcing immutability and cross-ref existence (e.g. NetworkDevice.siteRef must match its Cluster.siteRef).

Gap analysis vs. enhancement goals

Enhancement goal Covered by Gap to build
Provider relationships + contract metadata + service IDs Provider
Physical hierarchy: facility→cage→row→rack→unit; device→slot Site (facility) Rack + device Placement
Device cabling: power/serial/network/patch-panel ports, near/far-end Link (partial) Port + Cable
Network circuits: cross-connects, provider circuits, link to logical service/port Circuit + cross-resource ref
VM inventory: host assignment, allocation, provider/project Node (partial) VirtualMachine
Queryable + linkable to platform resources labels, indexers typed cross-group ObjectReference, printer columns
Audit history (who/what/when) apiserver audit (implicit) Activity capability integration

Proposed new kinds (all cluster-scoped, group inventory.miloapis.com/v1alpha1)

Provider

ProviderSpec:
  displayName        string (required)
  type               enum: Hosting|Colocation|Transit|InternetExchange|DarkFiber|Cloud
  contract           optional: { contractID, accountID, portalURL, notes }
  serviceIdentifiers []{ name, identifier }   # e.g. "ASN":"64512", "LOA-CFA":"..."

Referenced by Site (who runs the facility), Circuit, and VirtualMachine.

Rack

RackSpec:
  siteRef     LocalObjectReference (required, immutable)
  cage        string   # free-form; cage/row kept as attributes, not separate kinds
  row         string
  name        string
  heightU     int32    # rack units available
  powerFeeds  []{ name, phase, voltage, ampsRated }

Cage and row are modeled as fields rather than their own CRDs to avoid kind explosion; promote to kinds only if hierarchy needs independent lifecycle/RBAC.

Placement (embedded, not a kind)

Add an optional Placement to Node and NetworkDevice specs:

Placement:
  rackRef    LocalObjectReference
  startUnit  int32     # lowest occupied U
  unitHeight int32
  face       enum: Front|Rear

Webhook validates the U-range fits the rack and does not overlap another device in the same rack/face.

Port

PortSpec:
  deviceRef AssetReference (Node|NetworkDevice|Rack-PDU/patch-panel)
  type      enum: Power|Serial|Ethernet|Optical|PatchPanel
  name      string    # "eth0", "PSU1", "pp-A-24"
  speed     optional  # Quantity, for network ports

Ports are the near/far-end identifiers cabling connects.

Cable (physical layer; distinct from logical Link)

CableSpec:
  endpoints []PortReference  # exactly 2; near-end + far-end
  media     enum: Copper|Fiber-SMF|Fiber-MMF|Power|DAC
  lengthM   optional Quantity
  label     string

Keep Link for logical/capacity/latency relationships; Cable records the physical run between two Ports. A Link may reference the Cable(s) realizing it.

Circuit

CircuitSpec:
  providerRef  LocalObjectReference
  type         enum: CrossConnect|ProviderCircuit|Transit|Peering
  circuitID    string   # provider's circuit/LOA id
  bandwidthMbps optional int64
  aEnd         PortReference|SiteReference
  zEnd         PortReference|SiteReference
  serviceRef   optional ObjectReference  # cross-group, e.g. networking Galactic VPC uplink

Cross-connects are a Circuit type rather than a separate kind.

VirtualMachine

VirtualMachineSpec:
  hostRef     LocalObjectReference  # Node
  providerRef optional LocalObjectReference
  projectRef  optional ObjectReference  # resourcemanager.miloapis.com Project
  allocation  { vcpus, memoryBytes, disks[]{ name, sizeBytes, type } }

Models assignment + allocation only. Power state / health are explicit non-goals.

Cross-cutting: typed cross-group reference

Add an ObjectReference{APIGroup,Kind,Name,Namespace?} to common_types.go for linking inventory objects to other platform resources (the "provider circuit ↔ Galactic VPC uplink" goal). AssetReference stays for intra-inventory links.

Per-kind engineering checklist (the established pattern)

For each new kind, one PR delivers:

  1. api/v1alpha1/<kind>_types.go + make generate (deepcopy) + make manifests (CRD/RBAC/webhook).
  2. internal/controller/<kind>_controller.go — status conditions, referential-integrity reconcile, finalizers to block orphaning deletes (e.g. cannot delete a Rack with mounted devices).
  3. internal/webhook/v1alpha1/<kind>_webhook.go — enum/immutability/cross-ref validation; wire SetupXWebhookWithManager in cmd/inventory/main.go.
  4. SetupIndexers entries for the new reference fields.
  5. Printer columns + topology labels for query ergonomics.
  6. envtest coverage for webhook + controller.
  7. IAM: ProtectedResource / Role / PolicyBinding for the kind (per milo-iam).
  8. Deployment: new CRDs flow through the existing config/base + milo-integration component and the tiered milo / instance-of-milo / datum-deployment model — no new wiring beyond CRD registration.

Milestones (each becomes an issue hung off enhancements#713)

  • M0 — Prototype running (in progress, see Controller CrashLoops in staging #9): core kinds deployed on Datum Infra, repl loop established.
  • M1 — Providers & contracts: Provider; add providerRef to Site.
  • M2 — Physical hierarchy: Rack + Placement on Node/NetworkDevice; overlap-validation webhook.
  • M3 — Ports & cabling: Port, Cable; relate LinkCable.
  • M4 — Circuits & platform linkage: Circuit (incl. cross-connect) + cross-group ObjectReference; first concrete link to a networking resource.
  • M5 — VM inventory: VirtualMachine.
  • M6 — Audit & query: Activity capability integration for who/what/when; label-selector/field-selector ergonomics, datumctl + staff-portal read views.

Open questions

  1. Audit history: is apiserver audit logging + managedFields sufficient for v1, or do we want the Activity capability (ActivityPolicy + events) from the start? Affects M6 vs. earlier.
  2. Cage/row as fields vs. kinds: fields proposed for leanness — does any consumer need independent RBAC or lifecycle on a cage/row?
  3. Cable vs. Link: confirm we want a distinct physical Cable kind rather than overloading Link with media/near-far-end fields.
  4. Project association on VMs: reference resourcemanager.miloapis.com Project directly, or keep a free-form identifier like Cluster.provider?
  5. Reconciliation/discovery: explicit non-goal for v1 — confirm nothing here should auto-populate from live infra yet.

cc @scotwells (original prototype in #1)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions