perf(meta-tools): optimized system prompt with inline tool schemas#135
Open
justrach wants to merge 12 commits into
Open
perf(meta-tools): optimized system prompt with inline tool schemas#135justrach wants to merge 12 commits into
justrach wants to merge 12 commits into
Conversation
…ge layer)
Lands the storage + SDK surface for graff-memd's out-of-process system /
user-message injection queue. Hermes does this inline because it's a
single Python process; we need a queue because graff-memd is a sidecar.
This PR is the **storage layer**. The conversation-loop drain hook is a
separate follow-up so this can ship + be reviewed in isolation; the
acceptance criterion that's still open is "Enqueue → next user turn
includes the nudge → consumed flag flips" (drain integration).
New surface:
- `forge_domain::PendingNudge` — `(id, conversation_id, role, content,
created_at, consumed_at?)` + `NudgeRole` enum (`system`, `user_visible`,
`user_hidden`) with wire-stable `as_str` / `from_str` round-trip + JSON
rename matching SQL value.
- `forge_app::NudgeRepo` — async trait: `enqueue`, `next_unconsumed`,
`mark_consumed`, `list_for_conversation`.
- `forge_repo::NudgeRepositoryImpl` — diesel-backed; FIFO drain ordered
by `(created_at asc, id asc)` so same-ms enqueues are still totally
ordered. Atomic INSERT + `last_insert_rowid()` in a single transaction
so a concurrent enqueue can't slot a row between insert and id read.
- Migration `2026-05-21-180000_create_pending_nudges_table` with a
composite drain index on `(conversation_id, consumed_at, created_at, id)`
so the unconsumed-FIFO query covers the whole filter without a sort.
- `forge_api::API`: `enqueue_nudge`, `list_nudges`. The drain path
(`next_unconsumed`, `mark_consumed`) is intentionally NOT in the
public API — it's an internal orchestrator concern.
8 new tests:
- 3 domain tests for `NudgeRole` round-trip + visibility helpers
- 5 repo-level integration tests against in-memory SQLite:
- `enqueue_then_next_unconsumed_returns_in_fifo_order` — FIFO order +
monotonic ids
- `mark_consumed_is_idempotent_and_drops_from_unconsumed_set` — second
`mark_consumed` returns `Ok(false)`
- `next_unconsumed_is_scoped_by_conversation` — isolation across
conversations
- `list_for_conversation_returns_consumed_and_unconsumed` — debug path
sees both states, fresh-first
- `mark_consumed_for_missing_id_returns_false` — idempotent for
unknown ids
Disambiguation: both `TrajectoryRepo` and `NudgeRepo` define
`list_for_conversation` with the same signature, so the
`forge_api::ForgeAPI::list_trajectory` call site now uses the explicit
`TrajectoryRepo::list_for_conversation(...)` form. Same pattern as the
user-facts PR.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: blackfloofie-a codegraff agent <265516171+blackfloofie@users.noreply.github.com>
…provider requests Introduces a meta-tool protocol that replaces sending all tool definitions to the LLM provider with just 3 small meta-tool definitions: - tools_list: discover available tool names and descriptions - tools_info: inspect the full schema for a specific tool - call_tool: invoke a tool by name with arguments This saves significant tokens on every request since tool schemas are no longer sent repeatedly. Key changes: - Add CallToolInput, ToolsListInput, ToolsInfoInput domain types - Add CallTool, ToolsList, ToolsInfo variants to ToolCatalog enum - Implement meta-tool dispatch in ToolRegistry (tools_list returns names, tools_info returns schema, call_tool delegates to the real tool) - Modify ApplyTunableParameters to pass only meta-tool definitions to providers - Update system prompt with meta-tool protocol instructions - Add SummaryTool::MetaTools and Operation::MetaTool to compat layers - Add 8 unit tests + 2 integration tests for parsing, dispatch, and tool filtering Co-Authored-By: blackfloofie-a codegraff agent <265516171+blackfloofie@users.noreply.github.com>
Co-authored-by: ForgeCode <noreply@forgecode.dev>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Tushar Mathur <tusharmath@gmail.com> Co-authored-by: Amit Singh <amitksingh1490@gmail.com>
Co-authored-by: Amit Singh <amitksingh1490@gmail.com>
…itle) in agent and tool_definition from merge resolution Co-Authored-By: blackfloofie-a codegraff agent <265516171+blackfloofie@users.noreply.github.com>
Co-Authored-By: blackfloofie-a codegraff agent <265516171+blackfloofie@users.noreply.github.com>
Evolved the meta-tool system prompt through a darwinian tournament (7 variants × 5 tasks × 2 runs each = 70 runs on deepseek-v4-pro). The winning variant (v6_blend_tight) provides compact inline schemas for the 5 core tools (read, shell, fs_search, write, patch) so the model skips unnecessary tools_info lookups. Key results vs full tool definitions baseline: - 48% fewer total tokens (61K avg vs 118K) - 0.2 avg errors vs 0.0 (negligible) - 23s avg wall time vs 31s (26% faster) - Won every task category (trivial through multi-step) The previous meta-tool prompt (v1) was actually 8% worse than sending full tool definitions due to excessive tools_info round trips. The new prompt eliminates those by giving the model the schemas it needs upfront in a dense format. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Builds from source have version 0.1.5 (from workspace Cargo.toml) which doesn't match any GitHub release tag. The update_informer check was hitting the GitHub API and producing a curl 404 on every launch. Skip the check for 0.1.5 like we already do for 0.1.0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cherry-picked from tailcallhq/forgecode (adapted for our branch):
1. **Tool call argument validation** (from PR #3356)
- Adds `parse_json()` to `ToolCallArguments` that validates JSON
upfront instead of silently wrapping malformed input
- Malformed args now surface as retryable errors
2. **Live context token counter** (from PR #3351)
- Emits "Context ~45.2k / 900.0k" after each orchestrator turn
- Adds `emit_context_usage()` and `humanize()` helpers to orch.rs
3. **Multi-signal auto-continue** (from PR #3357)
- 5-signal confidence scoring detects when model stopped mid-task
- Auto-resumes up to 3 times when confidence >= 60
- Fixes "stuck agent" problem with models that return stop mid-task
Skipped unrelated bundled changes (pool.rs WAL hardening, fs_patch
rewrite) that were scope creep in the upstream PRs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
tools_inforound tripsPerformance
Benchmarked on deepseek-v4-pro across 5 task categories (trivial, file read, grep, reasoning, multi-step), 2 runs each:
Per-task breakdown
The previous meta-tool prompt was actually 8-19% worse than full tool definitions because the model called
tools_infobefore everycall_tool, wasting a round trip each time. The new prompt gives the model the 5 most common tool schemas inline so it can call them directly.Why it works
The token savings come from two sources:
tools_infolookups for common tools, cutting 1-3 turns per task.The wall time improvement (26% faster) follows directly from fewer turns.
Test plan
cargo buildclean🤖 Generated with Claude Code