Summary
Tooling to help a user understand and keep their agents healthy — a guided way to see what an agent is actually doing, how its projected content has drifted from its package/managed sources, and how much context it loads on a fresh session. Brainstorm-level; details TBD.
(Filed off the back of a debugging session where it was hard to tell why an agent ignored a tool: its AGENTS.md body was stale seed-once content, its managed block was un-applied drift, and a stale workflow-skill shadowed the package's — none of which surfaced anywhere obvious.)
Possible capabilities
- Drift report per agent: compare the agent's live runtime files (workspace
AGENTS.md/SOUL.md/skills) against (a) what its package would project and (b) what bakin doctor managed blocks expect — flag seed-once staleness, un-applied managed-block drift, and shadowed/colliding skills.
- Context budget per fresh session: estimate/measure how many tokens the agent's seeded context (SOUL/IDENTITY/AGENTS/TOOLS + enabled lessons + managed blocks) consumes at session start, and flag bloat.
- "What is this agent doing" timeline: recent tasks, tool calls (from the audit/usage feed), and outcomes for an agent, in one view — e.g. "Pixel ran N image tasks; M used
bakin_exec_images_generate, K used native".
- Tool-usage sanity: surface when an agent is bypassing the preferred tool path (e.g. native image gen vs
bakin_exec_images_generate).
- Lesson/skill effectiveness: which lessons were retrieved, scores, whether they're enabled.
Likely a bakin agents doctor <id> CLI + a UI panel on the agent detail page. Could reuse existing primitives: the audit/usage feed (src/core/usage.ts), managed-block check (bakin agent-rules --check), and the agent-package lockfile/projection records.
Acceptance (rough)
- A single command/view that tells a user "here's what your agent looks like vs what it should be, here's what it's been doing, and here's its session context cost."
Summary
Tooling to help a user understand and keep their agents healthy — a guided way to see what an agent is actually doing, how its projected content has drifted from its package/managed sources, and how much context it loads on a fresh session. Brainstorm-level; details TBD.
(Filed off the back of a debugging session where it was hard to tell why an agent ignored a tool: its
AGENTS.mdbody was stale seed-once content, its managed block was un-applied drift, and a stale workflow-skill shadowed the package's — none of which surfaced anywhere obvious.)Possible capabilities
AGENTS.md/SOUL.md/skills) against (a) what its package would project and (b) whatbakin doctormanaged blocks expect — flag seed-once staleness, un-applied managed-block drift, and shadowed/colliding skills.bakin_exec_images_generate, K used native".bakin_exec_images_generate).Likely a
bakin agents doctor <id>CLI + a UI panel on the agent detail page. Could reuse existing primitives: the audit/usage feed (src/core/usage.ts), managed-block check (bakin agent-rules --check), and the agent-package lockfile/projection records.Acceptance (rough)