Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
075ba1d
demo(act2): S0 — infra-tier ResourceQuota incident harness for Agent A
Jun 9, 2026
3af6b71
sre(s1): MVP — Helm template + 5 read-only kars-CR tools + CLI + plug…
Jun 9, 2026
5bdd29f
sre(s2): K8s diagnostic toolset — describe_resource, what_changed, en…
Jun 9, 2026
d956594
fix(sre): resolve helm chart path from repo root, not CWD
Jun 9, 2026
91efb4a
fix(sre): use --reset-then-reuse-values for kars sre install
Jun 9, 2026
f93598a
fix(sre): create kars-sre namespace explicitly in the chart
Jun 9, 2026
5718fc4
fix(sre): add --force-conflicts to helm upgrade (helm 4 SSA)
Jun 9, 2026
91accb0
fix(sre): ToolPolicy must live in KarsSandbox's namespace (kars-syste…
Jun 9, 2026
226f303
fix(sre): rename gate env KARS_SRE_ENABLED → SRE_ENABLED + indent fix
Jun 9, 2026
7fd3aa8
fix(sre): default contentSafety.requirePromptShields=false
Jun 9, 2026
c447aa7
sre: default model gpt-4.1 → gpt-5.4
Jun 9, 2026
96e70bb
fix(sre): declare sre_* tools in plugin.yaml provides_tools
Jun 9, 2026
f6e8d0d
sre: wire SRE-mode SOUL.md system prompt + fix register_tool kwargs
Jun 9, 2026
b25f41b
sre: apiserver-bypass for role=sre sandboxes (controller egress-guard)
Jun 9, 2026
ab866ed
fix(sre): correct AGT profile schema (version 1.0 + agent: name + pol…
Jun 9, 2026
c506c54
fix(sre): trailing-colon glob in AGT allow rules — match real action …
Jun 9, 2026
deff899
sre: NetworkPolicy egress allow for apiserver (cluster-portable)
Jun 9, 2026
0a26db4
fix(demo): agent-a-research.yaml passes CRD admission
Jun 9, 2026
72bedb2
fix(demo): break.sh uses kars.azure.com/component selector
Jun 9, 2026
81da63d
kars-sre: Slice 3 (typed apply-fix) + Slice 4 (proactive watcher + Te…
Jun 10, 2026
64cb040
kars-sre: Headlamp SRE Console + Chat (Slice 4 primary UX)
Jun 10, 2026
349901b
headlamp/sre: fix browser-ESM require() crash + add 'SRE not installe…
Jun 10, 2026
b48da89
headlamp/sre: stub Active Incidents — pluginLib.K8s.event isn't host-…
Jun 10, 2026
c3b935f
headlamp: bump plugin to v0.6.0 to bust Headlamp's plugin cache
Jun 10, 2026
a5e001f
cli: kars sre install handles 3 cluster shapes (helm release / kars d…
Jun 10, 2026
704c758
headlamp/sre: derive cluster name from URL for apiserver-proxy chat tab
Jun 10, 2026
4fb8681
controller: expose Hermes gateway port (18789) on per-sandbox Service
Jun 10, 2026
c8f9b74
headlamp/sre: replace iframe Chat tab with terminal-attach instructions
Jun 10, 2026
b588a5f
headlamp/sre: replace internal Link with plain anchor + bump to 0.6.3
Jun 10, 2026
8def50f
headlamp/sre: embed hermes dashboard PTY chat in browser
Jun 10, 2026
aee5a71
headlamp/sre: fix dashboard iframe — fetch HTML + rewrite asset URLs
Jun 10, 2026
59f99ed
headlamp/sre: dashboard wrapper strips proxy prefix to dodge /api/* c…
Jun 10, 2026
b91e4e1
sre: end-to-end embedded Hermes chat in Headlamp plugin
Jun 11, 2026
5f1c2ee
fix: ACR-name typo + workload-aware SRE Cluster Health card
Jun 11, 2026
043ea5e
fix(monitoring): include kars-ops dashboard in Grafana sidecar configmap
Jun 11, 2026
fcce016
hermes: pre-warm AGT mesh registration in idle-gateway mode
Jun 11, 2026
163e1de
hermes: persistent mesh-keepalive (replaces short-lived pre-warm)
Jun 11, 2026
3865b1c
hermes: enable AUTO_RESPONDER on the mesh keepalive process
Jun 11, 2026
94cab91
demo: bump dailyTokens cap to 2M for research + sre
Jun 11, 2026
02fb78d
plugin: workload-aware Phase column on Overview + Sandboxes pages
Jun 11, 2026
2ee6c91
sre-action: workload-aware recovery observer (no false Recovered)
Jun 11, 2026
8e7cb73
demo: 3 commented sandbox CRDs for the Act-I walkthrough
Jun 11, 2026
27802be
controller: stop spamming LimitedSupport event on every McpServer rec…
Jun 11, 2026
c3fc023
sre: phase-changes-only watcher mode (Telegram pager, not event fireh…
Jun 11, 2026
cfce890
sre: overlay workload availability on synthetic phase
Jun 11, 2026
4bf1560
sre-action: bump recovery window 5m→10m + late-recovery healer
Jun 11, 2026
1f556bf
docs(security): audit for kars-sre demo-and-agent slice (Slices 0-4 +…
Jun 11, 2026
f0c18a3
fmt: cargo fmt --all
Jun 11, 2026
244e6ad
ci(loc-budget): bump controller/src/reconciler/mod.rs phase0 cap 3450…
Jun 11, 2026
c006a3f
fix(lint): ruff + no-stubs gates
Jun 11, 2026
f1c1092
fix(clippy): drop dead phase_reporter field after McpServer cleanup
Jun 11, 2026
6ce1916
docs(blog): seed internal blog series — 4 of 7 posts drafted
Jun 11, 2026
ab39d95
docs(blog): drafts 5/6/7 + bump README status table
Jun 11, 2026
e16a483
docs(blog): rewrite post 1 as a position paper
Jun 11, 2026
308d9cf
docs: replace 🔱 emoji with actual logo in README header
Jun 11, 2026
699c124
docs(blog): revise post 1 — corrections after rubber-duck critique
Jun 11, 2026
39a78cb
docs(blog): address the 'sidecars are out of favor' objection
Jun 11, 2026
4e00196
docs(blog): split SIG alignment into explicit overlay + upstream modes
Jun 11, 2026
aae61eb
docs(blog): SIG section — replace fabricated 'upstream-compatible mod…
Jun 11, 2026
faadfdf
docs(blog): SIG section — honest 'governance overlay vs hardening ove…
Jun 11, 2026
1dcc791
docs(blog): SIG section — cite the actual in-flight upstream PRs and …
Jun 11, 2026
521f463
docs(strategy): competitive positioning + leadership plan + corrected…
Jun 14, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
<div align="center">

# 🔱 Agent Reference Stack for Kubernetes
<img src="docs/assets/logo.png" alt="kars logo" width="128" />

# Agent Reference Stack for Kubernetes

**A secure runtime for AI agents on Azure. Short name: `kars`.**

Expand Down
4 changes: 2 additions & 2 deletions ci/loc-budget.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -44,11 +44,11 @@ files:

- path: controller/src/reconciler/mod.rs
baseline_2026_04_24: 2383
phase0_cap: 3450
phase0_cap: 3700
phase1_cap: 1500
phase2_cap: 2000
allow_grow: true
notes: "Phase 0 cap bumped to 3050 in PR #323 to absorb cluster-aware memory scope + policy-quintet wiring (Context.cluster_name, openclaw_env injection, tool-policy mount on openclaw container); bumped to 3300 to land Hermes runtime-kind support in the deployment builder (entrypoint selection, env injection for KARS_RUNTIME_KIND/hermes-specific knobs); bumped to 3450 in Hermes-support PR for tool-surface parity (handoff routing, mesh transfer wiring, foundry native tool propagation, telegram_status divergence handling). Phase 1+ caps unchanged. allow_grow honored only until phase2_cap (2000); enforced strictly. Phase 3 must extract per-CRD reconcilers into controller/src/reconcilers/{sandbox,mcp_server,...}.rs and shrink mod.rs back to ≤800 (drop allow_grow at that point)."
notes: "Phase 0 cap bumped to 3050 in PR #323 to absorb cluster-aware memory scope + policy-quintet wiring (Context.cluster_name, openclaw_env injection, tool-policy mount on openclaw container); bumped to 3300 to land Hermes runtime-kind support in the deployment builder (entrypoint selection, env injection for KARS_RUNTIME_KIND/hermes-specific knobs); bumped to 3450 in Hermes-support PR for tool-surface parity (handoff routing, mesh transfer wiring, foundry native tool propagation, telegram_status divergence handling); bumped to 3700 in PR #397 (kars-sre demo-and-agent) to absorb cluster-portable apiserver egress-guard bypass (KUBERNETES_SERVICE_HOST/PORT lookup + ACCEPT/RETURN iptables rules for role=sre sandboxes), Hermes gateway port (18789) exposure on per-sandbox Service, SANDBOX_NAME+CLUSTER_NAME env on openclaw container for ClawMemory scope, mesh-keepalive entrypoint plumbing, and Telegram-channel + SRE_WATCHER_MODE env wiring for the proactive watcher. Phase 1+ caps unchanged. allow_grow honored only until phase2_cap (2000); enforced strictly. Phase 3 must extract per-CRD reconcilers into controller/src/reconcilers/{sandbox,mcp_server,...}.rs and shrink mod.rs back to ≤800 (drop allow_grow at that point)."

- path: controller/src/mesh_peer/mod.rs
baseline_2026_04_24: 1970
Expand Down
2 changes: 2 additions & 0 deletions cli/src/cli.ts
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ import { memoryCommand } from "./commands/memory.js";
import { inspectCommand } from "./commands/inspect.js";
import { auditCommand } from "./commands/audit.js";
import { headlampCommand } from "./commands/headlamp.js";
import { sreCommand } from "./commands/sre.js";

export function createCli(): Command {
const program = new Command();
Expand All @@ -57,6 +58,7 @@ export function createCli(): Command {
program.addCommand(listCommand());
program.addCommand(logsCommand());
program.addCommand(inspectCommand());
program.addCommand(sreCommand());

// Configuration
program.addCommand(credentialsCommand());
Expand Down
34 changes: 25 additions & 9 deletions cli/src/commands/dev/local-k8s.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1304,26 +1304,42 @@ export async function runLocalK8s(opts: LocalK8sOptions): Promise<void> {
if (opts.noBuild) {
stepper.done("skipped image load (--no-build)");
} else {
// `target` = the canonical image name the controller looks for
// INSIDE kind. `aliases` = local docker tags we accept as a SOURCE
// for re-tagging. `loadImageIfPresent` re-tags the matched local
// image AS the target before kind-loading, so the kind containerd
// ends up with the canonical name in `crictl images` and the
// controller's IfNotPresent pull succeeds without ever touching
// the network.
//
// Why we DON'T list `kars.azurecr.io/...`: that ACR doesn't exist.
// The legacy typo crept in from the 2026-05-27 rename
// (azureclaw→kars) before anyone noticed the real ACR is
// `karsjpdyyv.azurecr.io` (azd-suffixed) — the `karsacr` alias
// here is the canonical name the operator's deploy script
// re-publishes to. Keep only `karsacr.azurecr.io/...` so the
// controller env stays correct on AKS too.
const images: { target: string; aliases: string[] }[] = [
{
target: opts.image,
target: "karsacr.azurecr.io/openclaw-sandbox:latest",
aliases: [
"karsacr.azurecr.io/openclaw-sandbox:latest",
"kars.azurecr.io/openclaw-sandbox:latest",
opts.image, // e.g. "kars-sandbox:dev" (the local build)
"openclaw-sandbox:latest",
"openclaw-sandbox:dev",
],
},
{
target: "kars-controller:dev",
target: "karsacr.azurecr.io/kars-controller:latest",
aliases: [
"karsacr.azurecr.io/kars-controller:latest",
"kars.azurecr.io/kars-controller:latest",
"kars-controller:dev",
"kars-controller:latest",
],
},
{
target: "kars-inference-router:dev",
target: "karsacr.azurecr.io/kars-inference-router:latest",
aliases: [
"karsacr.azurecr.io/kars-inference-router:latest",
"kars.azurecr.io/kars-inference-router:latest",
"kars-inference-router:dev",
"kars-inference-router:latest",
],
},
];
Expand Down
Loading
Loading