Distillation: tighten prompt, fix metric, model-dependent Tier-1 limit by gnovak · Pull Request #626 · gnovak/remote-dev-bot

gnovak · 2026-05-23T06:06:17Z

Three independent fixes investigated for bridge-analysis #392, where distillation compressed `leaderboard.py` down to 972 tokens (entire file → snippets only) and the status log reported "5.6K → 972 tokens, ~$0.05 saved" — both numbers misleading.

1. Tighten the distill prompt against over-excerpting

When an issue references specific line numbers ("lines 312-313, 459-460, 534-535"), the distill LLM was extracting tiny snippets and discarding the surrounding file. The agent then kept saying "I need to read the full leaderboard.py" across iterations.

The prompt now has a hard rule: files to be modified are returned in FULL, regardless of how specific the task is. The user-message instructions and system prompt are consistent on this (no more "[full content or relevant excerpt]" escape hatch).

2. Fix the metric

Status log read:
```
Distillation: 5.6K → 972 tokens (4.6K saved/iter × 30 iters = ~$0.05 saved)
```

That 5.6K was the size of EXTRA_FILES (README, AGENTS.md, etc.) — NOT what distillation processed. The actual codebase distillation scanned was 368K tokens. `maybe_distill()` now returns the codebase total as a 6th tuple element so `resolve.py` can pass the correct value into `build_distillation_summary()`.

For bridge-analysis #392:

	Pre-token	Reported savings
Before	5.6K (wrong)	~$0.05
After	368K (correct)	~$4.30

3. Model-dependent Tier-1-vs-Tier-2 cutoff

`DISTILL_SMALL_REPO_LIMIT` was a hardcoded 100K, set when 200K-window models were typical. All our currently-configured models have 1M-token input windows. Added `_small_repo_limit(model)` which scales the cutoff to 25% of the model's input window, clamped to [50K, 500K]. Old 100K constant is the fallback for unknown models.

Effective Tier-1 ceiling per model:

Model	Window	Tier-1 cutoff
claude-sonnet-4-6 / opus-4-7	1M	250K
gpt-5.5	1.05M	262K
gemini-2.5-flash	1.05M	262K
claude-sonnet-4-5	200K	50K (floor)
unknown / not in LiteLLM	—	100K (legacy default)

Test plan

662 unit tests pass (was 655; +7 for `_small_repo_limit` + prompt content locks)
Verified by hand on bridge-analysis refactor: split build into its own workflow job (consistent with workshop/design) #392 numbers: "368K → 972 tokens, ~$4.30 saved"
Next `/agent-resolve` on a bigger repo should show realistic distillation metrics

🤖 Generated with Claude Code

Three independent fixes investigated for bridge-analysis #392, where distillation compressed leaderboard.py down to 972 tokens (entire file → ~snippets) and the status log misleadingly reported "5.6K → 972 tokens, ~$0.05 saved". ## 1. Tighten the distill prompt against over-excerpting When an issue references specific line numbers (e.g., "lines 312-313") the distill LLM was extracting tiny snippets and discarding the surrounding file. Prompt now has a hard rule: files to be modified are included in FULL, regardless of how specific the task is. Also addressed the specific line-number-excerpting failure mode in the prompt text. The user-message instructions and system prompt are now consistent on this (no more "[full content or relevant excerpt]" escape hatch). ## 2. Fix the metric (pre_tokens reflects what distillation saw) The status log read: "Distillation: 5.6K → 972 tokens (4.6K saved/iter × 30 iters)" That 5.6K was the size of EXTRA_FILES (README, AGENTS.md, etc.) — NOT what distillation processed. The codebase distillation actually scanned was 368K tokens. maybe_distill() now returns the codebase total as a 6th tuple element so resolve.py can pass the correct value into build_distillation_summary(). Bridge-analysis #392 same numbers, before/after this fix: OLD: 5.6K → 972 tokens, ~$0.05 saved NEW: 368K → 972 tokens, ~$4.30 saved ## 3. Model-dependent Tier-1-vs-Tier-2 cutoff DISTILL_SMALL_REPO_LIMIT was a hardcoded 100K, set when 200K-window models were typical. Modern models we configure (claude-sonnet-4-6, claude-opus-4-7, gpt-5.5, gemini-2.5-flash) all have 1M-token input windows, so we can comfortably send much larger codebases to a single Tier-1 LLM call. Added _small_repo_limit(model) which scales the cutoff to 25% of the model's input window, clamped to [50K, 500K]. Old 100K constant is the fallback for models LiteLLM doesn't know. Effective Tier-1 ceiling per model: - claude-sonnet-4-6, opus-4-7: 250K (1M × 25%) - gpt-5.5: 262K (1.05M × 25%) - gemini-2.5-flash: 262K - claude-sonnet-4-5 (200K win): 50K (floored) - unknown / not in LiteLLM: 100K (legacy default) 662 tests pass (was 655; +7 for _small_repo_limit + prompt content).

gnovak merged commit a2374ba into dev May 23, 2026

gnovak deleted the fix-distillation-prompt-and-metrics branch June 13, 2026 03:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distillation: tighten prompt, fix metric, model-dependent Tier-1 limit#626

Distillation: tighten prompt, fix metric, model-dependent Tier-1 limit#626
gnovak merged 1 commit into
devfrom
fix-distillation-prompt-and-metrics

gnovak commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gnovak commented May 23, 2026

1. Tighten the distill prompt against over-excerpting

2. Fix the metric

3. Model-dependent Tier-1-vs-Tier-2 cutoff

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant