Skip to content

feat: compact TOON encoder, response caching, optional Semgrep pre-pass#10

Merged
tusharshah21 merged 1 commit into
mainfrom
feat/toon-struct
May 27, 2026
Merged

feat: compact TOON encoder, response caching, optional Semgrep pre-pass#10
tusharshah21 merged 1 commit into
mainfrom
feat/toon-struct

Conversation

@tusharshah21

Copy link
Copy Markdown
Owner
  • Rewrote TOON as a tabular format (F:/C[N]{op,ln,code}: schema + CSV rows). ~45% smaller than the old per-chunk JSON, ~20% smaller than raw unified diff on a representative two-file diff. Context lines around changes are trimmed (configurable via CONTEXT_LINES, default 2).
  • Encoder now operates at the file level so F: header is emitted once per file.
  • Added LLM response cache keyed by SHA256(model, messages). Skips re-billing duplicate chunks within a job; toggle via ENABLE_CACHE (default on).
  • Optional Semgrep pre-pass: when SEMGREP_RULES is set and semgrep is on PATH, findings are passed to Agent 1 as priors. Agent 1 still verifies before flagging to filter false positives.
  • Fixer (Agent 2) now reads +/-15 lines of surrounding file context from the checked-out workspace when available, giving it imports and enclosing signatures for better fix accuracy.
  • README: documented Ollama and GitHub Models as new free-provider options, added Semgrep section, replaced aspirational "50-70%" line with benchmarked numbers, documented new inputs.
  • Bumped to 1.1.0 with CHANGELOG.

- Rewrote TOON as a tabular format (F:/C[N]{op,ln,code}: schema + CSV rows).
  ~45% smaller than the old per-chunk JSON, ~20% smaller than raw unified diff
  on a representative two-file diff. Context lines around changes are trimmed
  (configurable via CONTEXT_LINES, default 2).
- Encoder now operates at the file level so F: header is emitted once per file.
- Added LLM response cache keyed by SHA256(model, messages). Skips re-billing
  duplicate chunks within a job; toggle via ENABLE_CACHE (default on).
- Optional Semgrep pre-pass: when SEMGREP_RULES is set and semgrep is on PATH,
  findings are passed to Agent 1 as priors. Agent 1 still verifies before
  flagging to filter false positives.
- Fixer (Agent 2) now reads +/-15 lines of surrounding file context from the
  checked-out workspace when available, giving it imports and enclosing
  signatures for better fix accuracy.
- README: documented Ollama and GitHub Models as new free-provider options,
  added Semgrep section, replaced aspirational "50-70%" line with benchmarked
  numbers, documented new inputs.
- Bumped to 1.1.0 with CHANGELOG.
@tusharshah21 tusharshah21 merged commit e346380 into main May 27, 2026
1 check passed
@tusharshah21 tusharshah21 deleted the feat/toon-struct branch May 27, 2026 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant