StateSight is a GitOps forensic platform for Kubernetes.
Its purpose is to compare desired state from Git with live cluster state, explain drift, group it into incidents, and recommend actions (ignore, monitor, investigate, reconcile).
StateSight is not a deployment controller like Argo CD or Flux.
- Go API service with versioned routes, request IDs, structured JSON responses, health/readiness, and basic metrics.
- Go worker service that consumes Redis queue jobs and writes deterministic analysis outputs to Postgres.
- React + TypeScript + Vite + Tailwind web app with routed pages and API-backed data loading.
- PostgreSQL migrations for core domain entities, suppression audit records, scoped ignore rules, provisioned OIDC identities, and workspace-qualified relationships.
- Seed workflow with realistic sample data.
- Docker Compose local stack for Postgres, Redis, API, worker, and web.
- Makefile commands for setup, migrate, seed, run, format, test, and docs checks.
- Semantic diffing currently covers resource presence, replica counts, first-container images, named pod-template container presence, named container environment entries and resource requests/limits, annotations, metadata labels, and Service selectors; it is not a complete Kubernetes diff engine.
- Live-state collection uses
kubectlfor a limited resource set rather than a Kubernetes client integration. - Evidence records provide Git/live-state provenance and exact Kubernetes
managedFieldsownership where available; they do not yet correlate audit logs or prove which actor caused drift. - GitHub webhook endpoint is baseline-only (not full GitHub App install/auth flow).
- Git desired-state ingestion reads plain YAML/JSON manifests; Helm, Kustomize, Argo CD, and Flux integrations are not implemented.
- The web client does not yet implement an interactive OIDC sign-in flow; authenticated deployments currently expose the verified API boundary for an integrated client or gateway.
- No auto-remediation.
Local demo mode defaults to AUTH_REQUIRED=false. When AUTH_REQUIRED=true, the API requires a verified OIDC JWT bearer token for operator API endpoints. The GitHub webhook keeps its independent HMAC signature validation path.
Required API configuration:
OIDC_ISSUER_URL: OIDC discovery issuer URL.OIDC_AUDIENCE: expected token audience for the StateSight API.OIDC_ALLOW_INSECURE_ISSUER: defaults tofalse; enable only for a local plain-HTTP identity provider.
At startup the API discovers the provider and rejects an auth-enabled configuration that cannot be initialized. For each protected request it validates the bearer token signature through the discovered JWKS, issuer, audience, and token lifetime. Production issuer and JWKS URLs must use HTTPS.
Verified iss and sub claims map to a local user through user_identities; unmapped identities are denied. Roles continue to come from workspace_memberships (viewer, editor, admin). X-User-ID and X-User-Email are not authentication inputs and are not sent by the web client.
GET /api/v1/overview and GET /api/v1/applications additionally require X-Workspace-ID to choose the workspace being viewed. That header selects scope only: the authenticated local user must hold a membership for the selected workspace. Resource-addressed endpoints derive the workspace from the stored resource before enforcing membership.
The database enforces that an application's cluster and source definition belong to its workspace, and that an application-scoped ignore rule belongs to the same workspace as its application. A migration will fail if legacy data violates those tenant relationships; repair that data before deploying the constraint.
Identity provisioning is intentionally administrative until a managed enrollment flow exists. After creating a local user and its workspace membership, bind its verified provider identity using an operator-controlled migration or SQL statement:
INSERT INTO user_identities (issuer, subject, user_id)
VALUES ('https://identity.example.com', '<provider-subject>', '<local-user-uuid>');The worker honors:
GIT_BINandGIT_CACHE_DIRfor desired-state checkouts.KUBECTL_BINfor live-state collection.ALLOW_SYNTHETIC_LIVE_STATE, which defaults tofalse.
When kubectl cannot collect live resources, analysis fails by default. Set ALLOW_SYNTHETIC_LIVE_STATE=true only for local pipeline demonstrations; resulting incidents do not represent observations from a cluster.
For each unsuppressed drift incident, the worker persists:
- desired-state provenance identifying the analyzed Git repository, path, and revision;
- live-state provenance identifying whether the value was observed through
kubectlor generated by explicit synthetic demo fallback; - Kubernetes
managedFieldsevidence only when the live object reports ownership of the exact field path compared by the current diff engine.
Git and kubectl records describe where compared values came from and use not-attributed instead of inventing an actor. A managedFields manager is field-ownership evidence, not proof that the manager introduced the drift. Named resource request/limit findings can receive exact ownership evidence; aggregate named-container presence and environment-entry findings intentionally do not. Synthetic live state is recorded as untrusted and does not yield manager attribution.
An ignore rule's match_expression is one exact drift field path, such as:
spec.replicasspec.template.spec.containers[0].imagemetadata.annotations.example.com/managed-bymetadata.labels.app.kubernetes.io/namespec.selector.app.kubernetes.io/namefor a Servicespec.template.spec.containers[name=ledger-api].env[name=LOG_LEVEL]spec.template.spec.containers[name=ledger-api].resources.requests.cpu
Matching is case-sensitive and trims surrounding whitespace. Wildcards and regular expressions are not supported. Rules created through the application API or UI are scoped to that application and can optionally specify an exact resource_ref. Active resource-specific application rules are evaluated before application-wide rules, which are evaluated before inherited workspace rules. Within the same scope, the oldest matching rule is used first.
Existing rows with no application_id remain inherited workspace rules for compatibility. They are displayed on application details as read-only because changing one affects every application in that workspace.
A suppressed candidate does not create a drift incident. The worker stores a suppressed_findings audit record linked to the analysis snapshots, including the matching rule name and reason captured at analysis time. Application details expose the audit history under Suppressed and application-owned rule management under Ignore Rules: operators can create, edit, enable, disable, and delete those rules. Editing or deleting a rule changes future evaluation only; existing suppression audit records retain the captured rule explanation. Inherited workspace-rule administration is not implemented.
The web application is a dense, warm-black Git-oriented investigation console designed around scanning and evidence review. Its compact sidebar and table language are adapted from the local GitOps forensic UI reference while all data and actions remain backed by StateSight APIs. It surfaces compared field values, provenance trust state, absent attribution, and Kubernetes ownership caveats without implying unobserved causality.
The reference prototype contains separate Supabase-backed features, including AI insights, remediation actions, audit views, commit views, sync history, and cluster administration. Those screens are not exposed in StateSight until equivalent backend contracts exist. The shell uses the official orange Git logomark from git-scm.com/community/logos, credited there to Jason Long under CC BY 3.0. The visual and interaction contract is recorded in PRODUCT.md and DESIGN.md.
High-level structure:
apps/api: HTTP API serviceapps/worker: async job processorapps/web: frontend appinternal/*: service internals and pipeline boundariespkg/*: reusable domain/model utilitiesmigrations/: SQL schema migrationsscripts/migrate,scripts/seed: operational bootstrap commands
Detailed notes:
- Go 1.25.10+
- Node 22.13+
- Docker + Docker Compose
cp .env.example .env
cp apps/web/.env.example apps/web/.envDuring npm run dev, Vite proxies /api to the local API by default; keep VITE_API_BASE_URL empty for that workflow. Docker Compose serves the built web application through Nginx on port 5173, with Nginx proxying API requests to the API container.
The checked-in web configuration is for unauthenticated local demonstration. Enabling AUTH_REQUIRED=true requires an OIDC-capable client or gateway that supplies Authorization: Bearer <token>; browser authorization-code/PKCE login remains follow-up work.
docker compose up --build -dmake migrate-upmake seedThe seed data provides a prebuilt incident for UI inspection. Its source repository is illustrative, not a runnable analysis input. Running a real analysis requires an accessible manifest repository and Kubernetes credentials available to the worker. A containerized worker also needs any kubeconfig path referenced by a cluster record mounted inside its container.
make api
make worker
make webmake helpmake setupmake upmake downmake migrate-upmake seedmake apimake workermake webmake fmtmake testmake lintmake test-racemake security-gomake verify-webmake workflow-lintmake script-lintmake docs-check
GET /healthzGET /readyzGET /api/v1/overviewGET /api/v1/applicationsPOST /api/v1/applicationsGET /api/v1/applications/:idPOST /api/v1/applications/:id/analyzePOST /api/v1/applications/:id/ignore-rulesPUT /api/v1/applications/:id/ignore-rules/:ruleIDPATCH /api/v1/applications/:id/ignore-rules/:ruleIDDELETE /api/v1/applications/:id/ignore-rules/:ruleIDGET /api/v1/incidents/:idGET /api/v1/incidents/:id/timelinePOST /api/v1/github/webhook
Application detail responses include incidents, suppressions, and applicable ignore_rules; suppressed findings include the matching rule name and reason captured at analysis time.
- Correlate persisted provenance with audit, deployment, or controller signals without treating field ownership as causality.
- Add deliberate workspace-wide rule management with an appropriate authorization and blast-radius review boundary.
- Expand normalization, diff coverage, and incident grouping with focused tests.
- Complete the protected operator access flow with browser OIDC login and managed identity provisioning on top of the verified API bearer boundary.
- Add GitOps rendering/integration support and hardened Kubernetes collection.