Skip to content

feat: Kubernetes ServiceAccount auth for gRPC#424

Draft
wseaton wants to merge 2 commits into
ai-dynamo:mainfrom
wseaton:weaton/k8s-sa-auth
Draft

feat: Kubernetes ServiceAccount auth for gRPC#424
wseaton wants to merge 2 commits into
ai-dynamo:mainfrom
wseaton:weaton/k8s-sa-auth

Conversation

@wseaton

@wseaton wseaton commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Opt-in ServiceAccount auth for the gRPC server. Off by default; existing deployments are unaffected.

  • AuthN: verifies the caller's projected ServiceAccount token via the Kubernetes TokenReview API, in-process (no sidecar or mesh).
  • AuthZ: exact-match namespace:serviceaccount allowlist.
  • Modes off (default) and enforce. enforce fails config validation without a token audience and a non-empty allowlist.
  • Health service stays ungated so kubelet probes need no token.
  • Verified tokens and definitive rejections are cached (default 60s TTL).
  • Rust and Python clients attach the token when a projected token file is present and send nothing otherwise, so the same client works off-cluster against a server with auth off.
  • Helm: security.enabled injects the server env and, with serviceAccount.create, binds the SA to system:auth-delegator for TokenReview access.

Config via MODEL_EXPRESS_SECURITY_* env vars or security.* in the config file / Helm values.

@copy-pr-bot

copy-pr-bot Bot commented Jun 5, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions Bot added the feat label Jun 5, 2026
@wseaton wseaton changed the title feat: Kubernetes ServiceAccount auth behind an off-by-default mode feat: Kubernetes ServiceAccount auth for gRPC Jun 5, 2026
AuthN verifies the caller's projected ServiceAccount token via the Kubernetes
TokenReview API in-process (no sidecar/mesh). AuthZ is an exact-match
namespace/serviceaccount allowlist. Modes are off (default) and enforce; enforce
fails config validation without a token audience and a non-empty allowlist.

The health service stays ungated so kubelet probes need no token. The Rust and
Python clients attach the token when a projected token file is present and send
nothing otherwise, so the same client works off-cluster against an off server.
The Helm chart binds the server SA to system:auth-delegator when enabled.

Signed-off-by: Will Eaton <weaton@redhat.com>
@wseaton wseaton force-pushed the weaton/k8s-sa-auth branch from 5adb15b to 65c47cd Compare June 12, 2026 19:51
@codecov

codecov Bot commented Jun 12, 2026

Copy link
Copy Markdown

@wseaton wseaton force-pushed the weaton/k8s-sa-auth branch from db23675 to 4f1f2ac Compare June 22, 2026 19:27
Signed-off-by: Will Eaton <weaton@redhat.com>
@wseaton wseaton force-pushed the weaton/k8s-sa-auth branch from 4f1f2ac to f7b00f5 Compare June 22, 2026 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant