Skip to content
This repository was archived by the owner on Nov 15, 2025. It is now read-only.
This repository was archived by the owner on Nov 15, 2025. It is now read-only.

[Epic] Prompt Optimization Framework – bbeval opt #15

@christso

Description

@christso

Goal

Transform bbeval into a complete, production-grade prompt optimization platform that:

  • Generates high-quality, structured prompts in Markdown, YAML, or SudoLang
  • Enforces company guidelines (attached via --guidelines)
  • Works across all agent types with <10 LLM calls
  • Outputs Copilot-ready .prompt.md files

Scope: 3 Modes + 1 Engine

Mode Use Case Input Output Optimizer
direct-llm Non-agentic reviews (SQL, schema, docs) Base .prompt.md + guidelines Optimized .prompt.md BootstrapRS
dspy-agent ReAct, tool-calling agents DSPy module + AgentGateway YAML config + MCP routes BootstrapRS
external-wrapper Copilot,840 Codex, Claude Desktop Mock output + refiner Post-process .py + .prompt.md COPRO / SIMBA

Core Requirements

  1. bbeval opt subcommand

    bbeval opt tests.yaml base.prompt.md \
      --mode direct-llm \
      --guidelines constraints.yaml \
      --format markdown \
      --output optimized.prompt.md
  2. Guideline Reinforcement Engine (Child Issue #101)

    • Enforces rules during optimization
    • Supports Markdown, YAML, SudoLang
    • Injects attachments ({{guidelines}} → PDF text)
  3. Universal Metric

    • bbeval run --jsonl as scoring oracle
    • Supports code execution, SQL linting, semantic match
  4. Output Format (Auto-Selected)

    Format When Example
    Markdown (default) Single workflow .prompt.md with headers, tools
    YAML Multi-workflow workflow: [step: analyze, tools: [...] ]
    SudoLang Loops, state, branching for each file, lint(file)

Success Metrics

Metric Target
Manual tuning time 90% reduction
Guideline compliance 100%
LLM calls per run ≤10
Copilot-ready output Drop-in .prompt.md
Format correctness Zero invalid outputs

Example: Optimized .prompt.md

---
description: 'Review SQL schema changes'
mode: 'agent'
tools: ['runInTerminal', 'getTerminalOutput', 'edit']
---

# SQL Schema Change Reviewer

You are a fintech DBA with 15 years in high-frequency trading systems.

{{guidelines}}

## Task
Review `${selection}` for:
- Index coverage on WHERE/JOIN columns
- No `SELECT *` in production views
- Partitioning on date columns

## Instructions
1. Run `EXPLAIN ANALYZE` on critical queries
2. Check for missing indexes
3. If issue found: emit fix with `edit`
4. Persist until all checks pass

## Output

    ```diff
    - -- Missing index
    + CREATE INDEX idx_orders_date ON orders(order_date);
    ```

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions