Prompt optimization, version control & profiling
Problem
Prompts in agent_core/core/prompts/ have grown to ~1,600 lines with
duplicated instructions and dead fragments. We pay for this bloat on every
LLM call, prompt edits are buried in unrelated code diffs, and we have no
way to measure if a change helped.
Goals
- Chunk prompts into reusable fragments (role, policy, action-space, etc.)
so shared text is defined once.
- Dedupe & prune redundant instructions and unused constants.
- Version control each prompt as a first-class unit — every prompt gets a
name + version, and changes are independently diffable, revertible, and
trackable (not lost inside larger code commits). Built on top of the
existing PromptRegistry.
- Profile token cost per prompt and add a basic quality eval so changes
report a delta, not vibes.
Acceptance criteria
Prompt optimization, version control & profiling
Problem
Prompts in
agent_core/core/prompts/have grown to ~1,600 lines withduplicated instructions and dead fragments. We pay for this bloat on every
LLM call, prompt edits are buried in unrelated code diffs, and we have no
way to measure if a change helped.
Goals
so shared text is defined once.
name + version, and changes are independently diffable, revertible, and
trackable (not lost inside larger code commits). Built on top of the
existing
PromptRegistry.report a delta, not vibes.
Acceptance criteria
app/prompt.pyre-exports.