Skip to content

M6: Audit & query (Activity capability + query ergonomics) #34

Description

@ecv

Milestone M6 of the physical inventory implementation plan, successor to M5 #30, under datum-cloud/enhancements#713. Final milestone of the plan.

Goal

Make inventory auditable (who changed what, when — as human-readable timelines) and queryable (find assets by attribute and topology) for operators. This closes the last two enhancement goals and turns the registry from "data is in there" into "operators can actually answer questions and trace changes".

Addresses enhancement goals: "Inventory supports audit history — who changed what and when" and "All inventory objects are queryable via API …".

Decisions (settled up front)

  • Audit mechanism: the platform Activity capability (activity.miloapis.com/v1alpha1), not raw audit logs alone. We author ActivityPolicy resources and emit Kubernetes Events; the Activity system translates audit logs + events into timelines. Mirrors dns-operator and network-services-operator.
  • Scope of this issue: the milo-os/inventory repo only. Client work (datumctl, staff-portal) is specified here but tracked as separate downstream issues in those repos (see Downstream).
  • Audience: staff / operators. Inventory is infra-facing and cluster-scoped on the Milo control plane. Timelines and read views target operators (staff-portal + datumctl), not per-project consumer surfaces. Activities are platform/org-scoped, not project-scoped.
  • Query ergonomics: printer columns + topology labels + field indexers. Server-side custom field selectors and aggregate/rollup endpoints are explicitly deferred (out of scope for M6).

Scope

Audit — Activity capability integration

The Activity system is declarative and policy-driven: services do not emit activity records. They (1) author ActivityPolicy CEL resources describing how their operations read in a timeline, and (2) emit events.k8s.io/v1 Events for async state transitions. The system matches audit logs + events against the policies and produces human-readable Activity records, queryable/streamable via the activity API.

  1. ActivityPolicy per inventory kind — one policy each for Region, Site, Cluster, Node, NetworkDevice, Link, Provider, Rack, Port, Cable, Circuit, VirtualMachine. Each defines:
    • audit rules for create / update / delete (CRUD has audit.responseObject, so summaries can reference spec fields and display names directly — e.g. "{{ actor }} placed Node {{ name }} in rack {{ spec.placement.rackRef.name }}").
    • event rules for async controller outcomes (Ready, reference-NotFound), keyed off Event annotations.
  2. Controller event emission — controllers currently only set Status.Conditions. Add best-effort events.k8s.io/v1 Event emission on state transitions (Ready true/false, …NotFound) via a typed client, with structured annotations (prefix inventory.miloapis.com/) carrying display values for the policy templates. Best-effort: failures logged and swallowed.
  3. Deployment — ship policies as a config/milo/activity/policies/ kustomize component (one policy file per kind + a kustomization.yaml), installed onto the Milo control plane alongside the existing CRD/webhook flow. Mirror the dns-operator / network-services-operator layout.
  4. Validation — exercise each policy with PolicyPreview (sample audit + event inputs) before merge.

Query — ergonomics for operators

Resource queries are largely kubectl-style and already on the API; M6 makes them ergonomic rather than adding a query service.

  • Printer columns — audit every kind for useful kubectl get / datumctl get columns (key refs, type, topology). Fill gaps left from M1–M5.
  • Label-selector ergonomics — confirm the topology.inventory.miloapis.com/* labels propagate consistently across all kinds so --selector topology...=… answers "everything in region/site/cluster/rack X".
  • Field indexers — confirm the indexers registered in SetupIndexers cover common "find by ref" lookups; add any missing.
  • Audit/timeline queries — validate, no new code: ActivityQuery / AuditLogQuery and datumctl activity query --filter "<CEL>" return sensible inventory timelines.

Tasks

  • config/milo/activity/policies/ActivityPolicy per kind (12 kinds) + kustomization.yaml; component wired into the milo control-plane install
  • Controllers — emit events.k8s.io/v1 Events on Ready / …NotFound transitions, with inventory.miloapis.com/-prefixed display annotations; shared event helper in internal/controller/ (mirror conditions.go)
  • RBAC — controller create on events.k8s.io Events
  • PolicyPreview fixtures / test for each policy (audit create-update-delete + key event rules)
  • Printer-column pass across all kinds; fill gaps
  • Confirm topology.* label propagation is uniform across all kinds (incl. VirtualMachine from M5)
  • Audit SetupIndexers coverage for common ref lookups; add any missing indexes
  • Settle activity scoping for cluster-scoped infra objects (see Design details) — confirm staff/operator visibility model with the Activity service owners
  • Docs — short docs/ note on querying inventory + reading inventory activity (CEL filter examples), matching the dns-operator activity-integration doc style
  • envtest / unit — event emission on transitions; policy PolicyPreview round-trips
  • Deployment — policies component flows through config/base + milo-integration install model

Design details to settle in the PR

  • Activity scoping for cluster-scoped, infra-facing objects. The Activity system is multi-tenant (org / project / user); inventory kinds are cluster-scoped on the Milo control plane with no project owner. Settle how their activities surface to staff (platform/org-scoped visibility) and confirm with the Activity service owners — this is the one genuine unknown. VirtualMachine.projectRef is the only kind with a natural project association, and even then the audience here is operators, not the project's end-users.
  • Event vs. condition parity. Events are emitted in addition to conditions, not instead — conditions remain the machine-readable status; events feed human-readable timelines. Keep them in sync via one helper.
  • Policy template content — summaries must use human-friendly display names and avoid leaking internal topology the same way dns-operator policies do; decide per-kind which spec/label fields are safe and useful to surface.

Exit criteria

  • An ActivityPolicy exists for every inventory kind; PolicyPreview produces sensible summaries for create/update/delete and for Ready / …NotFound events.
  • Controllers emit Events on state transitions; a create-then-resolve lifecycle (e.g. create a Circuit referencing a missing Provider, then create the Provider) produces a readable timeline via datumctl activity query.
  • Every kind shows useful columns under kubectl get / datumctl get, and --selector topology...=… returns the expected assets.
  • A short operator-facing doc shows how to query inventory and read its activity.
  • Downstream client issues filed and linked (below).

Downstream (separate issues, not this milestone's code)

  • datumctl (datum-cloud/datumctl) — confirm inventory kinds are discoverable via api-resources and usable through get/describe; activity query works against inventory; optional convenience read views. File + link.
  • staff-portal (datum-cloud/staff-portal) — operator read views for inventory kinds + an inventory activity timeline, consuming the existing activity.miloapis.com SDK. File + link.

Depends on

  • M5 M5: VM inventory (VirtualMachine) #30 (VirtualMachine) merged — the policy/column/label pass covers all kinds including VirtualMachine.
  • Availability of the Activity service (activity.miloapis.com) on the target Milo control plane.

Follow-on

None — M6 is the final milestone of #15. Remaining work is the downstream client issues above and any enhancement non-goals (discovery/reconciliation, health, DCIM depth) deferred to future enhancements.

cc @scotwells

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions