bootstrap: Engage Claude Code auto-compaction for third-party models#70
Conversation
| grep -Fxq 'auth_token=deepseek-test-key' "$TEST_HOME/claude-launcher-env" | ||
| grep -Fxq 'model=deepseek-reasoner' "$TEST_HOME/claude-launcher-env" | ||
| grep -Fxq 'debug=1' "$TEST_HOME/claude-launcher-env" | ||
| grep -Fxq 'auto_compact_window=1000000' "$TEST_HOME/claude-launcher-env" |
There was a problem hiding this comment.
This is wrong. If DeepSeek V4 Pro's context window is 1 million, we shouldn't compact when we reach 1 million, right? We should do it just before hand.
For 1 million context window Opus models, at what point does Claude Code autocompact? I just want the answer to be consistent.
There was a problem hiding this comment.
Good catch — you're right that compacting at the window would be too late, and chasing this down turned up a second problem with the original values. I verified the behavior against the installed Claude Code 2.1.168 binary (the cn → wh$/hE4 path).
For a 1M-context Opus model, auto-compaction fires at ~967,000 tokens, not at 1,000,000. The trigger is window − ~20K (output reserve) − ~13K (buffer). So the value is the window size, not the compaction point — there's automatic headroom. (The docs mention a "~95%" trigger via CLAUDE_AUTOCOMPACT_PCT_OVERRIDE, but that percentage path is off by default; the real default is the fixed ~33K headroom.) Even taken at face value, 1000000 would have compacted at ~967K, not at 1M.
But the bigger issue: the original values never took effect. Claude Code clamps the window to min(tv(model), value), where tv(model) returns 1M only for names matching /[1m]/i, the beta-header path, or first-party opus-4-7/4-8. deepseek-reasoner and nvidia/nvidia/nemotron-3-ultra match none of those, so they fall to the 200K default — meaning 1000000 and 240000 were both silently clamped to a 200K window already.
So the real job of this env var here isn't to set 1M — it's to give the window an explicit env source, which is what flips the local-session trigger gate from "skip" to "engage." That's the actual fix for the unbounded-growth failure.
I've changed both arms to 200000 — the honest effective ceiling. Compaction now fires at ~167K, well under Nemotron's 262,144 limit, using the same window − reserve − buffer formula a 1M Opus model uses. To genuinely give DeepSeek a ~967K compact point we'd need Claude Code to recognize its true window, which there's no third-party knob for (the [1m] tag is a first-party alias; appending it to a gateway model id would just send a bogus name). Full breakdown is in the updated PR description.
There was a problem hiding this comment.
Yes — and I've raised it to Nemotron's true limit. Pushed an update that changes the approach so the configured window actually takes effect.
The catch I hit before: CLAUDE_CODE_AUTO_COMPACT_WINDOW is clamped to min(model_window, value), and an unrecognized third-party model name resolves to the 200K default — so any value above 200K was silently capped. The fix is to tag the model name with a [1m] suffix, which makes Claude Code resolve the full window. I verified against a mock endpoint that Claude Code strips [1m] before the request, so the gateway still receives the real id (deepseek-reasoner, nvidia/nvidia/nemotron-3-ultra).
With the clamp lifted, I set each window to the model's true limit:
- DeepSeek →
1000000→ compaction at ~967K - Nemotron →
262144(its real limit, up from the earlier 200K) → compaction at ~229K
262144 is the principled maximum for Nemotron: compaction fires at window − ~33K (a ~20K output reserve + ~13K buffer — the same margin a first-party model uses), so 262144 lands the trigger at ~229K with a full 33K of headroom under the 262,144 hard limit. Going higher would push the trigger past the limit and risk the ContextWindowExceededError this is meant to prevent.
aa20c8d to
320930b
Compare
A local (non-remote) Claude Code session only fires the auto-compact trigger when the model's context window resolves to a known size — and for a model name Claude Code does not recognize it falls back to a 200K default whose source the trigger gate rejects. So for third-party models (deepseek-reasoner, nvidia/nvidia/nemotron-3-ultra) auto-compaction never runs and the conversation grows until the provider rejects the request with a fatal ContextWindowExceededError. Observed in a real run: a Nemotron 3 Ultra session grew monotonically across 533 turns with zero compaction, then died at ~268K tokens against its 262,144 limit. Tag the model name with a "[1m]" suffix in the deepseek and nemotron launcher arms and pin CLAUDE_CODE_AUTO_COMPACT_WINDOW to the model's true context window. The "[1m]" suffix makes Claude Code resolve the full window (without it the window — and therefore the auto-compact ceiling — is capped at the 200K default) and engages the trigger; Claude Code strips the suffix from the model name before the request, so the gateway still receives the real model id. Verified against a mock Anthropic endpoint: with ANTHROPIC_MODEL=deepseek-reasoner[1m] the wire request carries model=deepseek-reasoner. - DeepSeek V4 Pro -> window 1000000, compaction at ~967K. - Nemotron 3 Ultra -> window 262144 (its true limit), compaction at ~229K. Compaction fires ~33K below the window (a ~20K output reserve plus a ~13K buffer), the same window-minus-reserve margin a first-party model uses, so it triggers before either provider's hard limit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
320930b to
ec4127b
Compare
What
Tag the model name with a
[1m]suffix and pinCLAUDE_CODE_AUTO_COMPACT_WINDOWto the model's true context window, in thethird-party-deepseekandthird-party-nemotronlauncher arms:1000000, auto-compaction at ~967K262144(its true limit), auto-compaction at ~229KWhy
In a local (non-remote) session, Claude Code only fires its auto-compact trigger when the model's context window resolves to a known size. For a model name it doesn't recognize, the window falls back to a 200K default whose source the trigger gate rejects — so auto-compaction never runs and the conversation grows unbounded until the provider rejects the request with a fatal
ContextWindowExceededError.Observed in a real run: a Nemotron 3 Ultra session grew monotonically across 533 turns with zero compaction, then died with:
How
Two coupled levers, verified against the installed Claude Code 2.1.168 binary and a mock Anthropic endpoint:
[1m]suffix on the model name makes Claude Code resolve the model's window to 1M (tv()matches/[1m]/i). Without it, the window — and therefore the auto-compact ceiling — is clamped to the 200K default regardless ofCLAUDE_CODE_AUTO_COMPACT_WINDOW, and the trigger stays disabled. Critically, Claude Code strips[1m]from the model name before the request, so the gateway still receives the real model id. Confirmed against a mock endpoint:ANTHROPIC_MODEL=deepseek-reasoner[1m]sendsmodel=deepseek-reasoneron the wire (andnvidia/nvidia/nemotron-3-ultra[1m]sendsnvidia/nvidia/nemotron-3-ultra).CLAUDE_CODE_AUTO_COMPACT_WINDOWpinned to the true window then sets where compaction fires:window − ~20K (output reserve) − ~13K (buffer), the same formula a first-party model uses. So DeepSeek (1M) compacts at ~967K and Nemotron (262,144) at ~229K — each ~33K below its hard limit.This addresses @brycelelbach's review: compaction fires before the window, not at it, with the same margin a 1M-context Opus model gets (which auto-compacts at ~967K, not at 1M). The
[1m]tag also attaches thecontext-1m-2025-08-07anthropic-betaheader; these gateways already receive a stack of beta headers and ignore unknown ones, so it's additive.Test plan
./test.bash --lint—bash -n+ shellcheck clean../test.bash --unit— all pass. The deepseek/nemotron launcher tests assert the[1m]-taggedANTHROPIC_MODEL(model=deepseek-reasoner[1m],model=nvidia/nvidia/nemotron-3-ultra[1m]) andauto_compact_window=1000000/262144in the generated launcher env../test.bash --docker— fullbootstrap.bashin a freshubuntu:22.04;All e2e assertions passed./=== docker e2e passed ===.claudebinary with the launcher's env; captured request showsmodel=deepseek-reasoner(the[1m]suffix stripped) while Claude Code resolves the 1M window. Confirms the gateway receives a valid model id../test.bash --secrets— gitleaks,no leaks found(working tree + history)../test.bash --e2e— covered by--docker(same assertions, isolated container); no host-only behavior in this change.🤖 Generated with Claude Code