Skip to content

feat(frontend): show AI chat context-size usage and fix trim gate#9490

Draft
Guilhem-lm wants to merge 1 commit into
mainfrom
glm/ai-chat-token
Draft

feat(frontend): show AI chat context-size usage and fix trim gate#9490
Guilhem-lm wants to merge 1 commit into
mainfrom
glm/ai-chat-token

Conversation

@Guilhem-lm

Copy link
Copy Markdown
Contributor

Summary

Surfaces the AI chat's current context-window usage as a 45k / 200k badge next to the model selector, and makes the conversation-trim gate accurate.

The provider-reported token usage was already computed end-to-end (per provider → accumulated in runChatLoop) but dropped on the floor. This PR captures the final loop iteration's usage (prompt + completion) as an accurate "current context size" anchor, persists it per chat, and displays it.

It also fixes a real overflow bug: the existing trim gate estimated tokens as chars÷4 over message content only — ignoring the system message and tool schemas (tens of thousands of tokens in agentic mode), so it under-counted and could let context overflow the model window → provider 400s. The gate now counts those and calibrates the crude estimate against the accurate anchor.

Changes

  • chatLoop.ts: add lastIterationUsage to ChatLoopResult — the final iteration's usage (last-write-wins across all 4 provider branches). Distinct from the existing summed tokenUsage (= total billed, unchanged).
  • AIChatManager.svelte.ts: new contextTokens state (= lastIterationUsage.total), captured after runChatLoop. Rewrite the trim gate: #crudeEstimate now includes the system message + tool schemas; #estimateContextTokens scales the crude estimate by contextTokens / anchorCrude when an anchor exists, else falls back to crude. Export getTrimThreshold, MAX_TOKENS_THRESHOLD_PERCENTAGE, MAX_TOKENS_HARD_LIMIT. Reset on new chat, restore on loadPastChat.
  • HistoryManager.svelte.ts: persist optional contextTokens per chat in IndexedDB (no version bump — optional field, old chats read back undefined).
  • AIChatDisplay.svelte: 45k / 200k badge beside ProviderModelSelector, hidden until a real count exists, colour shifts neutral→amber→red near the trim threshold.
  • AIChatManager.test.ts: 4 new gate tests.

Test plan

  • npm run check:fast — clean for all changed files
  • npx vitest run AIChatManager.test.ts — 11/11 pass (system-message counting, tool-schema counting, paired assistant+tool eviction on trim, anchor calibration)
  • svelte-autofixer — zero issues on the badge code
  • Open the AI chat on a real model, run a multi-turn agentic conversation; confirm the badge appears after the first turn, climbs across turns, and shifts amber→red near the trim threshold
  • Restore a chat saved before this feature (no stored contextTokens) — badge stays hidden, chat still sends
  • Confirm a 1M-window model (gpt-4.1/gemini) renders the denominator as 1M

🤖 Generated with Claude Code

Surface the AI chat's current context-window usage as a "45k / 200k" badge
next to the model selector, and make the conversation-trim gate accurate.

The provider-reported token usage was already computed end-to-end but dropped.
Capture the final loop iteration's usage (prompt + completion) as an accurate
"current context size" anchor, persist it per chat in IndexedDB, and display it.

Re-base the trim gate on the same anchor: the crude chars-per-4 estimate now
includes the system message and tool schemas (previously ignored, which let
real context overflow the model window), and is calibrated against the accurate
anchor when one exists. The badge colour shifts neutral->amber->red as usage
approaches the threshold where older messages start being dropped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@cloudflare-workers-and-pages

Copy link
Copy Markdown

Deploying windmill with  Cloudflare Pages  Cloudflare Pages

Latest commit: 1ccf2b7
Status: ✅  Deploy successful!
Preview URL: https://0974c385.windmill.pages.dev
Branch Preview URL: https://glm-ai-chat-token.windmill.pages.dev

View logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant