A local HTTP router for Claude Code. Routes each request to either api.anthropic.com or a local LiteLLM gateway based on the model name and current Anthropic quota state. As a side effect, captures anthropic-ratelimit-* response headers and writes a one-line status file at ~/.claude/usage-status.md so Claude can read its own quota state.
Forked from InertiaUK/claude-quota-proxy, which provided the original transparent-proxy implementation and usage-file design. LiteLLM fallback, multi-model routing, and the rename are downstream additions.
Works with Claude Code (CLI) only. The web chat and browser extension talk to Anthropic's infrastructure directly — they don't route through a local proxy.
The router has four capabilities that can be used independently:
-
Quota visibility — always on. Forwards every request to Anthropic unchanged, scrapes the
anthropic-ratelimit-*response headers, and writes a one-line status file at~/.claude/usage-status.md. Claude reads that file to know how close it is to the 5-hour, 7-day, and overage limits. Multiple accounts sharing one router can be tracked independently — setCLAUDE_CONFIG_DIRS(see Multiple accounts). -
LiteLLM fallback — opt-in via
LITELLM_URL. When any utilization window hits a configured threshold,claude-*requests are redirected to a local LiteLLM instance with the body'smodelfield rewritten to a tier-matched substitute (opus,sonnet,haiku). Non-Anthropic models (gpt-*,gemini-*, etc.) always go to LiteLLM regardless of quota, with the body forwarded as-is. -
Composer 2.5 — opt-in via
CURSOR_API_KEY. Routes requests to Cursor's Composer 2.5 model via native Anthropic↔OpenAI translation, enabling Claude Code to use Composer as an alternative model alongside Anthropic and LiteLLM options. Switch by naming any model matchingcomposer-*(e.g.composer-2.5). -
Model-list aggregation — automatic whenever LiteLLM and/or Composer is enabled. Intercepts
GET /v1/models(the Anthropic Models API endpoint that backs Claude Code's model-selection dialog) and returns a single merged list: Anthropic's own models, plus every LiteLLM model (translated to Anthropic shape), plus synthetic Composer entries. Without this, the dialog only ever sees Anthropic models.
If both LITELLM_URL and CURSOR_API_KEY are unset, claude-router is byte-identical to a pure passthrough: no body buffering, no model inspection, no background probe, and GET /v1/models forwards straight to Anthropic.
Claude Code
│
│ ANTHROPIC_BASE_URL=http://127.0.0.1:4080
▼
┌─────────────────────────────────────────────────────┐
│ proxy.js (single Node.js script, zero npm deps) │
│ │
│ 1. classify request │
│ claude-* → Anthropic (or LiteLLM if over) │
│ other → LiteLLM (always) │
│ │
│ 2. forward, read response headers │
│ 3. write ~/.claude/usage-status.md │
│ 4. probe Anthropic every 5min while redirected │
└─────────────────────────────────────────────────────┘
│ │
▼ ▼
api.anthropic.com (HTTPS) localhost:4000 (LiteLLM, optional)
Claude Code honors the ANTHROPIC_BASE_URL environment variable. Point it at the router (http://127.0.0.1:4080) and all API traffic flows through it. claude-router is a single Node.js script with zero npm dependencies. It runs as a Windows service (via NSSM), launchd agent (macOS), or systemd user unit (Linux).
- Set
ANTHROPIC_BASE_URL=http://127.0.0.1:4080in your environment. - Install claude-router as a background service (see Installation).
- (Optional) Set
LITELLM_URLandLITELLM_API_KEYto enable fallback to a LiteLLM gateway. - Restart Claude Code. Ask "what's my current quota usage?" — Claude will read
~/.claude/usage-status.md.
Requires Node.js and an admin PowerShell session. NSSM is downloaded automatically.
# Run as Administrator
& "path\to\claude-router\install-service.ps1"The script:
- Validates Node.js is on PATH.
- Resolves your user profile path.
- Downloads NSSM to
./tools/nssm.exe. - Registers
ClaudeRouteras an auto-starting Windows service running underLocalSystem. - Sets
ANTHROPIC_BASE_URL=http://127.0.0.1:4080as a user environment variable.
Close all Claude Code windows and open a fresh one — existing sessions inherited their env before the install.
To uninstall:
# Run as Administrator
& "path\to\claude-router\uninstall-service.ps1"Create ~/Library/LaunchAgents/com.claude-router.plist:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.claude-router</string>
<key>ProgramArguments</key>
<array>
<string>/usr/local/bin/node</string>
<string>/path/to/claude-router/proxy.js</string>
</array>
<key>RunAtLoad</key><true/>
<key>KeepAlive</key><true/>
<key>StandardOutPath</key>
<string>/path/to/claude-router/proxy.log</string>
<key>StandardErrorPath</key>
<string>/path/to/claude-router/proxy-error.log</string>
<!-- Add <key>EnvironmentVariables</key><dict>…</dict> for LITELLM_URL etc. -->
</dict>
</plist>launchctl load ~/Library/LaunchAgents/com.claude-router.plistAdd to ~/.zshrc or ~/.bash_profile:
export ANTHROPIC_BASE_URL=http://127.0.0.1:4080Create ~/.config/systemd/user/claude-router.service:
[Unit]
Description=Claude Router
[Service]
ExecStart=/usr/bin/node /path/to/claude-router/proxy.js
Environment=CLAUDE_USAGE_FILE=%h/.claude/usage-status.md
# Optional — track quota per account when multiple Claude logins share this router:
# Environment=CLAUDE_CONFIG_DIRS=%h/.claude,%h/.claude2
# Optional — enable LiteLLM fallback:
# Environment=LITELLM_URL=http://localhost:4000
# Environment=LITELLM_API_KEY=sk-...
Restart=always
StandardOutput=append:/path/to/claude-router/proxy.log
StandardError=append:/path/to/claude-router/proxy-error.log
[Install]
WantedBy=default.targetsystemctl --user enable claude-router
systemctl --user start claude-routerAdd to ~/.bashrc or ~/.zshrc:
export ANTHROPIC_BASE_URL=http://127.0.0.1:4080All claude-router settings are environment variables. Only ANTHROPIC_BASE_URL (on the Claude Code side) is required.
| Variable | Purpose | Default |
|---|---|---|
CLAUDE_USAGE_FILE |
Path to the usage status file Claude reads (single-account / fallback). | ~/.claude/usage-status.md |
CLAUDE_CONFIG_DIRS |
Comma-separated list of Claude Code config dirs to track per account (e.g. ~/.claude,~/.claude2). See Multiple accounts. |
unset (single-account) |
PORT |
TCP port the proxy listens on. | 4080 |
BIND |
Bind address. | 127.0.0.1 |
If you run more than one Claude account through the same router — e.g. several Claude Code sessions where some set CLAUDE_CONFIG_DIR=~/.claude and others CLAUDE_CONFIG_DIR=~/.claude2 — set CLAUDE_CONFIG_DIRS to the list of those dirs:
export CLAUDE_CONFIG_DIRS=~/.claude,~/.claude2With this set, the router identifies each request by its auth token, reads each dir's .credentials.json to map that token to its config dir, and writes a separate usage-status.md into each dir. Each account's 5h/7d/overage quota — and its LiteLLM-fallback routing — is tracked independently, so one account hitting its limit never overwrites the other's usage file or redirects the other's traffic. Token rotation (OAuth refresh) is handled automatically: the credentials file is re-read on the next request.
Without CLAUDE_CONFIG_DIRS, the router stays single-account: all traffic shares one quotaState and one CLAUDE_USAGE_FILE, and concurrent accounts will pollute each other's numbers.
Remove any symlink first. If
~/.claude2/usage-status.mdis a symlink to~/.claude/usage-status.md, both accounts alias one file. The router defensively replaces a symlinked target with a real file on first write, but it's cleanest torm ~/.claude2/usage-status.mdonce so each dir owns a real file. (.credentials.jsonmust stay a separate real file per dir — that's what distinguishes the accounts.)
The fallback feature activates only when LITELLM_URL is set. All other variables in this table are no-ops while it is unset.
| Variable | Purpose | Default |
|---|---|---|
LITELLM_URL |
LiteLLM base URL, e.g. http://localhost:4000. Feature gate — unset disables the LiteLLM half of the router. |
unset |
LITELLM_API_KEY |
Bearer token sent on every LiteLLM-bound request. Required when LITELLM_URL is set. |
unset |
LITELLM_FALLBACK_OPUS |
Model name substituted into the body when a claude-opus-* request is redirected. Empty string disables (causes 500 on opus redirects). |
claude-opus-4-7 |
LITELLM_FALLBACK_SONNET |
Same, for claude-sonnet-*. |
claude-sonnet-4-6 |
LITELLM_FALLBACK_HAIKU |
Same, for claude-haiku-*. |
claude-haiku-4-5 |
REDIRECT_AT_5H_PCT |
5-hour utilization threshold (integer 1–100). | 90 |
REDIRECT_AT_7D_PCT |
7-day utilization threshold. | 90 |
REDIRECT_AT_OVERAGE_PCT |
Overage utilization threshold. | 80 |
HYSTERESIS_PCT |
How far below the threshold every window must drop before switching back to Anthropic. Prevents oscillation. | 5 |
PROBE_INTERVAL_MS |
Background probe period while in redirect mode (ms). | 300000 (5 min) |
PROBE_MODEL |
Model used in probe count_tokens requests. |
claude-haiku-4-5 |
ANTHROPIC_API_KEY_FOR_PROBES |
Dedicated Anthropic key for probe requests. When set, the cached client bearer is never used for probes. | unset |
MAX_BUFFER_BYTES |
Maximum body size buffered on /v1/messages and /v1/messages/count_tokens. Requests exceeding this return 413. |
10485760 (10 MB) |
ANTHROPIC_HOST_OVERRIDE |
Override Anthropic target as host[:port]. Test seam — not for production. |
api.anthropic.com:443 |
The Composer feature activates only when CURSOR_API_KEY is set. All other variables in this table are no-ops while it is unset.
| Variable | Purpose | Default |
|---|---|---|
CURSOR_API_KEY |
Bearer token sent on every Composer-bound request. Feature gate — unset disables Composer. Obtain from Cursor Dashboard → Integrations. | unset |
COMPOSER_API_URL |
Composer API base URL. Host-only — scheme, host, and port are used; any path component in the URL is ignored, since the fixed route /opencodev2/v1/chat/completions is always appended. |
https://api-for-cursor.standardagents.ai |
COMPOSER_MODELS |
Comma-separated Composer model ids surfaced in the merged GET /v1/models list (see below). Each id must match the composer-* routing pattern so a selected id round-trips back to Composer. |
composer-2.5 |
How to use: Set your CURSOR_API_KEY from the Cursor Dashboard (Integrations section), then in Claude Code select a model name starting with composer — e.g. type composer-2.5 when prompted for a model. The router translates your Anthropic Messages API request to OpenAI chat-completions format, forwards it to Composer, and translates the response back. Streaming is fully supported.
Known limitations (best-effort):
tool_resulterrors (is_error: true) are forwarded as plain tool-role content with no structured error marker — OpenAI's tool role has no error channel.- Image input is best-effort; some image formats may not round-trip perfectly.
- Token usage is estimated by composer-api and displayed for reference only — it does not update the
~/.claude/usage-status.mdquota file. - Composer is explicit-only — it is never used as a quota fallback target when
LITELLM_URLis configured. Namecomposer-*explicitly to route to Composer.
Claude Code's model-selection dialog is backed by the Anthropic Models API — GET /v1/models, which returns { data: [{ type, id, display_name, created_at, … }], has_more, first_id, last_id }. By default the router forwards that request straight to Anthropic, so the dialog only lists Anthropic's own models.
Whenever LiteLLM and/or Composer is enabled, the router instead intercepts GET /v1/models and answers with a merged list:
- Anthropic is fetched first and is the source of truth. It is also the auth gate: a non-2xx Anthropic response (e.g.
401) is passed through verbatim, so credential errors still surface correctly. The response'santhropic-ratelimit-*headers are scraped exactly as on any other Anthropic call, so quota tracking keeps working off model-list traffic. - LiteLLM models are fetched from
GET {LITELLM_URL}/v1/models(OpenAI-shaped) and translated to AnthropicModelInfoobjects. A LiteLLM outage is non-fatal — those models are simply omitted. - Composer models are appended synthetically from
COMPOSER_MODELS(Composer exposes no model list of its own).
Entries are concatenated in that priority order and de-duplicated by id (so a LiteLLM-exposed claude-* won't shadow the native Anthropic entry). The merged response always sets has_more: false — pagination is collapsed into a single page.
Claude Code's model-selection dialog only accepts model ids matching ^(claude|anthropic) — so a raw gemini-3.1-pro-preview or composer-2.5 would be filtered out. To get them accepted, the router remaps every foreign id (any id not already starting with claude/anthropic) into a reserved namespace before listing it:
gemini-3.1-pro-preview → claude-router-gemini-3.1-pro-preview
composer-2.5 → claude-router-composer-2.5
claude-opus-4-7 (litellm)→ claude-opus-4-7 (already accepted — left as-is)
Only the id is wrapped; display_name keeps the real model name, so the picker stays readable. When a request later arrives with a claude-router-* model, the router demaps it back to the real id and rewrites the request body before routing, so the underlying backend (Composer / LiteLLM) receives the genuine model name and the dispatch rules below classify it correctly. Demapping is a no-op for ordinary claude-* requests and for real foreign ids you send directly (e.g. claude --model gemini-3.1-pro-preview still works unchanged). claude-router- is a reserved prefix — Anthropic ships no model under it.
Claude Code reads a model's context window from a literal [1m] suffix in the id (/\[1m\]/i → 1,000,000 tokens) and, when it sees one, also adds the context-1m-2025-08-07 beta header to the request. The router can offer a 1M variant of a foreign model by listing a second [1m]-suffixed entry — set MODELS_1M to a comma-separated list of real ids (post-demap). Each token is either an exact id or a prefix* glob, so "all gemini models" is just gemini*:
MODELS_1M=gemini* # every gemini-* model gets a 1M variant
MODELS_1M=gemini*,gpt-5 # plus an exact id
For each listed model the merged list gains an extra entry — e.g. claude-router-gemini-3.1-pro-preview[1m], shown as gemini-3.1-pro-preview (1M context) — alongside the default 200K base entry. When such a variant is selected:
- The
[1m]suffix is stripped on demap (along with the prefix), so the backend gets the plain real id —claude-router-gemini-3.1-pro-preview[1m]→gemini-3.1-pro-preview. - The
context-1m-2025-08-07beta header is dropped before forwarding to LiteLLM/Composer, since those backends don't understand it (unrelated betas are preserved).
MODELS_1M is opt-in and defaults to empty — only add models whose backend genuinely serves a large window, since the suffix makes Claude Code treat the model as 1M-token locally (affecting compaction/usage math). This applies only to the router's claude-router-* namespace: a native claude-opus-4-8[1m] request has no prefix, so its id and its 1M beta header pass through to Anthropic completely untouched.
Serving a correct merged /v1/models is only half of it. Getting the models into the picker takes two client-side things, plus a cache seed:
1. Enable gateway model discovery in Claude Code's own environment (not the proxy):
CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1
Put it wherever Claude Code reads env from — e.g. the env block of ~/.claude/settings.json, or the shell profile that exports ANTHROPIC_BASE_URL.
2. Seed the discovery cache. With discovery enabled, the picker is populated from <CLAUDE_CONFIG_DIR>/cache/gateway-models.json (default ~/.claude/cache/gateway-models.json), not from a live GET /v1/models — that live fetch only runs for an enterprise gateway-auth setup, which a personal login doesn't have. Worse, Claude Code discards the whole cache unless its baseUrl exactly matches the current ANTHROPIC_BASE_URL. So the router ships a seeder that writes this file with the right baseUrl and the same merged/remapped/[1m] model set the endpoint serves:
ANTHROPIC_BASE_URL=http://127.0.0.1:4080 \
LITELLM_URL=… LITELLM_API_KEY=… CURSOR_API_KEY=… MODELS_1M='gemini-3*' \
node seed-gateway-cache.jsBest run automatically on every proxy start — add to the service unit (it already has the env):
ExecStartPost=-/usr/bin/node /path/to/claude-router/seed-gateway-cache.jsThe seeder reuses the proxy's own helpers, fetches LiteLLM's model list, adds Composer + a stock-Claude baseline, applies the same remap + [1m] variants, and writes the cache. A transient LiteLLM outage won't clobber an existing good cache. Knobs: CLAUDE_CONFIG_DIR, SEED_STOCK_MODELS, STOCK_1M_MODELS (see below). Restart Claude Code after seeding; reopen /model if a stale picker is cached.
Native Claude 1M under a custom base URL. Claude Code only serves the 1M window when the model id carries a [1m] suffix unless it's talking to an official endpoint — pointing it at this router counts as custom, so its automatic "(1M context)" variants for Opus/Sonnet are suppressed. The seeder therefore lists 1M-capable Claude models both ways: Claude Opus 4.8 (200K) and Claude Opus 4.8 (1M context) → claude-opus-4-8[1m] (1M, routed natively to Anthropic with the context-1m beta). The set defaults to claude-opus-4-8,claude-opus-4-7,claude-opus-4-6,claude-sonnet-4-6; override with STOCK_1M_MODELS. Pick the "(1M context)" entry in /model to actually get 1M.
Request body model |
Quota state | Upstream | Body rewritten? |
|---|---|---|---|
claude-router-* (remapped dialog pick) |
any | demapped to real id, then re-classified by the rows below | yes → real id |
claude-opus-* |
below threshold | Anthropic | no |
claude-sonnet-* |
below threshold | Anthropic | no |
claude-haiku-* |
below threshold | Anthropic | no |
claude-opus-* |
at/above threshold | LiteLLM | yes → LITELLM_FALLBACK_OPUS |
claude-sonnet-* |
at/above threshold | LiteLLM | yes → LITELLM_FALLBACK_SONNET |
claude-haiku-* |
at/above threshold | LiteLLM | yes → LITELLM_FALLBACK_HAIKU |
claude-* (unknown tier) |
at/above threshold | LiteLLM | no (forwarded as-is) |
composer-* |
any | Composer | no (translated to OpenAI format) |
anything else (gpt-*, gemini-*, …) |
any | LiteLLM | no |
| body unparseable / missing model | any | Anthropic (fail-safe) | no |
Tier classification is by case-insensitive substring match: a model name containing opus is opus-tier, sonnet is sonnet-tier, haiku is haiku-tier. Anything else starting with claude- is "unknown tier".
GET /v1/models is intercepted and answered with a merged model list (see Model-list aggregation) whenever LiteLLM and/or Composer is enabled; otherwise, and for every other endpoint, requests go to Anthropic without body inspection.
- Redirect engages when any one of the three utilization windows reaches its threshold.
- Switch-back requires all three windows to drop below
threshold − HYSTERESIS_PCT(default 5 points). - Each mode transition logs a line:
[proxy] mode transition: anthropic -> litellm (5h=…%, 7d=…%, overage=…%). - Each dispatched request logs a line:
[proxy] dispatch: anthropic|litellm reason=… model=… [rewrite=…].
While in redirect mode no Anthropic responses are arriving, so quota state cannot update from live traffic. claude-router fires a minimal POST /v1/messages/count_tokens against Anthropic every PROBE_INTERVAL_MS:
- Uses
ANTHROPIC_API_KEY_FOR_PROBESif set; otherwise the most recently captured clientauthorization/x-api-keyheader. - If no client auth has been captured yet and no probe key is configured, the tick is skipped.
- After three consecutive probe failures (e.g. 401), the interval doubles up to a 1-hour cap.
- When a client request arrives with a different auth value (key rotation), backoff is reset and the probe fires at the original cadence.
The probe response carries fresh anthropic-ratelimit-* headers, which update in-memory quota state and may trigger a switch back to Anthropic.
~/.claude/usage-status.md is overwritten on every Anthropic response (and every probe response while redirected):
5h=9% 7d=99%! overage=0% bottleneck=seven_day (10/05/2026, 16:19:04)
- 5h — rolling 5-hour window utilization
- 7d — rolling 7-day window utilization
- overage — paid burst pool (available once 7d is exhausted)
!suffix — that window returned anallowed_warningstatus- bottleneck — which window Anthropic currently considers binding
One unified pool covers all models. There is no separate Sonnet or Opus pool despite the Claude Code UI showing individual bars.
Once the proxy is running, ask Claude:
What's my current quota usage?
Claude reads ~/.claude/usage-status.md and reports the values. The file is updated on every request so it's always current.
Add a rule to your global ~/.claude/CLAUDE.md so Claude adjusts behavior based on quota state. Examples ranked from light to strict:
Report at session start
## Quota awareness
At the start of each session, read `~/.claude/usage-status.md` and report the 5h and 7d usage.
If either shows `!`, flag it.Warn before large tasks
## Quota awareness
Before any task involving more than ~10 tool calls or significant code generation,
read `~/.claude/usage-status.md`. If 7d usage is above 80%, say so and confirm before proceeding.
If 7d is above 95%, ask whether to continue or defer until the window resets.Adjust approach by usage level
## Quota awareness
Read `~/.claude/usage-status.md` at the start of each session.
- Below 70% on both windows: normal operation
- 70–90% on 7d: prefer concise responses, avoid spawning multiple subagents unless necessary
- Above 90% on 7d: lightweight mode — short responses, no subagents, note that quota is low
- `!` on any window: mention it before starting multi-step tasksHard stop near limit
## Quota awareness
Read `~/.claude/usage-status.md` at the start of each session. If 7d usage is above 98%,
do not start new implementation tasks. Explain the quota state and suggest resuming tomorrow
or switching to a lighter approach.Instead of asking Claude to read the file, inject it into every prompt via a UserPromptSubmit hook. Add to your Claude Code settings.json:
{
"hooks": {
"UserPromptSubmit": [
{
"matcher": "",
"hooks": [
{
"type": "command",
"command": "pwsh -NoProfile -Command \"$f='$env:USERPROFILE\\.claude\\usage-status.md'; if (Test-Path $f) { $c=Get-Content $f -Raw; Write-Output (ConvertTo-Json @{hookSpecificOutput=@{hookEventName='UserPromptSubmit';additionalContext=$c}}) }\""
}
]
}
]
}
}On macOS/Linux, swap the command for a shell equivalent reading ~/.claude/usage-status.md.
When LITELLM_URL is set, /v1/messages and /v1/messages/count_tokens request bodies are buffered up to MAX_BUFFER_BYTES (10 MB default) so the model field can be inspected. Oversized bodies return 413 before any upstream call. Streaming-response bodies (from either upstream) are not buffered — they are piped straight back to the client.
If LITELLM_URL is unset, no body buffering occurs at all and the router is a pure pipe.
If /v1/messages is called with a malformed JSON body, the request is forwarded to Anthropic with the original bytes intact and logged as dispatch: anthropic reason=parse-failed-fail-safe. This is a deliberate fail-safe: routing a parse failure to LiteLLM would change semantics for a request claude-router doesn't understand.
When ANTHROPIC_API_KEY_FOR_PROBES is unset, the background probe reuses the cached client authorization / x-api-key header. This cache is:
- In-memory only — never written to disk.
- Lives for the router process lifetime.
- Used exclusively for probe calls to
api.anthropic.com:443over TLS. - Never logged. Never sent to LiteLLM.
For security-conscious deployments, set ANTHROPIC_API_KEY_FOR_PROBES to a dedicated probe-only Anthropic key. This eliminates the cached-bearer scope entirely.
claude-router assumes one Anthropic account per running instance. Sharing one router across multiple Anthropic accounts causes:
- Quota-state pollution (utilization is aggregated across accounts).
- Probe-credential cross-contamination (the cached auth may not belong to the account being probed).
Run a separate router on a separate port for each Anthropic account.
All headers observed on a Claude Max plan account (confirmed 2026-05-10):
| Header | Example value | Notes |
|---|---|---|
anthropic-ratelimit-unified-5h-utilization |
0.09 |
Decimal fraction — multiply by 100 for % |
anthropic-ratelimit-unified-7d-utilization |
0.99 |
|
anthropic-ratelimit-unified-overage-utilization |
0.0 |
Paid burst pool |
anthropic-ratelimit-unified-representative-claim |
seven_day |
Which window is the bottleneck |
anthropic-ratelimit-unified-5h-status |
allowed / allowed_warning |
|
anthropic-ratelimit-unified-7d-status |
allowed_warning |
|
anthropic-ratelimit-unified-fallback-percentage |
0.5 |
Throttle applied if over limit |
anthropic-ratelimit-unified-upgrade-paths |
overage |
Available options when at limit |
Why HTTP for the local connection? ANTHROPIC_BASE_URL=http://… makes the SDK speak plain HTTP to claude-router — no localhost certificate management. The router makes a separate HTTPS connection to the real API.
Why LocalSystem for the Windows service? Avoids storing user credentials in the service config. The install script bakes your actual home path into CLAUDE_USAGE_FILE at install time instead.
Why NSSM? One binary, no npm dependencies, handles log rotation, clean install/uninstall. node-windows downloads its own binary at install time and requires npm — same outcome with more moving parts.
Why fail-safe on parse failure? A malformed body claude-router can't read might still be valid to Anthropic (e.g. SDK version skew) but is unlikely to make sense to a LiteLLM gateway with rewritten routing. Defaulting to Anthropic preserves Claude Code's expected behavior for requests the router doesn't understand.
Why probe with count_tokens? It's the cheapest Anthropic endpoint that still returns the unified rate-limit headers. Minimal token cost while redirected.
npm test
# or directly:
node --test tests/smoke.test.jsTests use Node's built-in node:test (Node ≥ 18). Zero npm dependencies. Local mock servers on ephemeral ports — no real Anthropic or LiteLLM calls.
claude-router/
proxy.js zero-dependency Node.js router
install-service.ps1 Windows service installer (run as admin)
uninstall-service.ps1 Windows service uninstaller (run as admin)
tests/
smoke.test.js smoke test suite (node:test, no npm deps)
tools/
nssm.exe downloaded by install-service.ps1
proxy.log stdout (1 MB rotation via NSSM)
proxy-error.log stderr
MIT — see LICENSE. Originally forked from InertiaUK/claude-quota-proxy.