cnjack · cnjack · Jun 22, 2026 · Jun 22, 2026 · coderabbitai · Jun 22, 2026
diff --git a/docs/usage-stats.md b/docs/usage-stats.md
@@ -0,0 +1,114 @@
+# Usage statistics
+
+jcode records token usage across every surface (TUI, web, ACP) and exposes two
+views in the web UI:
+
+- **Global stats** — a "Usage" tab in Settings: tokens used, sessions, turns,
+  active days, current streak, most-used model, an activity heatmap, a daily
+  token trend, and per-model / per-project breakdowns.
+- **Per-task context capacity** — a popover on the composer's token count: how
+  the current context window is split across Messages / System tools / MCP tools
+  / Skills / System prompt, plus the KV cache hit rate.
+
+## Data model
+
+### Token tracking (`internal/model`)
+
+`model.TokenUsage` accumulates per-call usage. Each call is recorded via
+`Add(AddParams{...})`, capturing:
+
+| field         | source (go-openai `Usage`)                       |
+|---------------|--------------------------------------------------|
+| Prompt        | `PromptTokens`                                   |
+| Completion    | `CompletionTokens`                               |
+| Total         | `TotalTokens`                                    |
+| Cached        | `PromptTokensDetails.CachedTokens` (cache-read)  |
+| Reasoning     | `CompletionTokensDetails.ReasoningTokens`        |
+| CacheWrite    | always 0 — see below                             |
+
+All providers go through one go-openai client. go-openai's `Usage` exposes
+**cache-read** (`cached_tokens`) and **reasoning** tokens, but **not**
+`cache_creation_input_tokens`. So `CacheWriteTokens` is reserved for a future
+native transport and stays 0 today.
+
+### Cache hit rate
+
+```
+cache hit rate = Σ cached / Σ prompt        (clamped to [0,1])
+```
+
+i.e. the fraction of prompt tokens served from the provider's KV cache. This is
+the only provider-portable definition given the wire constraint above.
+`CacheObserved()` (any cached tokens seen) drives a "—" placeholder so 0% is not
+confused with "this provider doesn't report caching".
+
+### Event log (`internal/usage`)
+
+Global stats are persisted to an **append-only JSON-lines log** at
+`~/.jcode/usage/events.jsonl`, one line per agent turn:
+
+```json
+{"ts":1750531200,"date":"2026-06-21","project":"/path","session":"<uuid>","model":"glm-5.2","prompt":1500,"completion":300,"cached":1300,"reasoning":60,"total":1800,"calls":2}
+```
+
+Append-only `O_APPEND` writes are atomic for small records, so multiple jcode
+processes (TUI + web + ACP) can record concurrently without a read-modify-write
+race. All derived metrics (streak, active days, heatmap, per-model/project,
+cache rate) are computed at read time by `usage.Aggregate`.
+
+Token fields are per-turn **deltas**: the runner snapshots the cumulative
+tracker at the start of a turn and records the difference at the end. Subagent
+and teammate tokens are rolled into the same log under the **leader** session's
+UUID so multi-agent work isn't undercounted.
+
+The session **count** is sourced from the session index
+(`session.ListAllSessions`), which is authoritative; the event log owns
+token/day metrics.
+
+## API
+
+| endpoint                     | returns                                           |
+|------------------------------|---------------------------------------------------|
+| `GET /api/usage/stats?days=N`| global totals, streaks, heatmap (365d), trend (Nd), by-model, by-project |
+| `GET /api/tasks/{id}/stats`  | per-task context breakdown (active) or token rollup (historical) |
+| `GET /api/status`            | live token snapshot (extended with cache fields)  |
+
+The `token_update` WebSocket event carries the same per-turn token fields +
+cache hit rate to the browser.
+
+## Per-task context breakdown
+
+The five buckets are estimated at **~4 bytes/token** (`usage.Estimate`) — there
+is no bundled tokenizer, and a relative breakdown only needs a consistent
+heuristic (the UI labels it "estimated"):
+
+1. **System prompt** = estimate(systemPrompt) − estimate(skill descriptions)
+2. **System tools** = Σ estimate(tool JSON) over built-in tools
+3. **MCP tools** = Σ estimate(tool JSON) over MCP tools
+4. **Skills** = estimate(skill descriptions)
+5. **Messages** = max(0, lastPromptTokens − buckets 1-4)
+
+The four static buckets are computed on demand from the live agent assembly
+(`command/web.go`'s `breakdownFn`), which reads the captured `systemPrompt` /
+`mcpTools` / `currentCM` / `skillLoader` by reference — so project switches and
+MCP reloads are reflected with no cache to invalidate. The breakdown is only
+meaningful for the **active** task; historical tasks return token totals + the
+aggregate hit rate only (`is_active:false`).
+
+## Known limitations / future work
+
+- **No `cache_creation` accounting** — blocked by the shared go-openai transport.
+  A native Anthropic transport could populate `CacheWriteTokens`.
+- **Cost is not yet derived** — `registry.go`'s `ModelCost`
+  (Input/Output/CacheRead/CacheWrite) is not multiplied into the stats. A future
+  pass could price each event for a spend view.
+- **Per-turn delta across process restart** — a turn that resumes in a new
+  process loses the in-memory start snapshot and may mis-count once.
+
+## Testing
+
+Per the sandbox constraints (live servers can't bind sockets), the backend is
+covered by in-process `httptest` (`internal/web/usage_test.go`) and unit tests
+for aggregation/streaks (`internal/usage/usage_test.go`) and the token struct
+(`internal/model/token_usage_test.go`). The frontend is verified via
+`vue-tsc` + `vite build`.
diff --git a/internal/command/web.go b/internal/command/web.go
@@ -2,6 +2,7 @@ package command
 
 import (
 	"context"
+	"encoding/json"
 	"fmt"
 	"os/signal"
 	"path/filepath"
@@ -29,10 +30,29 @@ import (
 	"github.com/cnjack/jcode/internal/skills"
 	"github.com/cnjack/jcode/internal/telemetry"
 	"github.com/cnjack/jcode/internal/tools"
+	"github.com/cnjack/jcode/internal/usage"
 	util "github.com/cnjack/jcode/internal/util"
 	"github.com/cnjack/jcode/internal/web"
 )
 
+// estimateToolTokens approximates a tool's contribution to the context window
+// from its serialized schema (name + description + parameters). ToolInfo's
+// MarshalJSON includes the JSON-schema params, so one marshal captures it all.
+func estimateToolTokens(ctx context.Context, t tool.BaseTool) int {
+	if t == nil {
+		return 0
+	}
+	info, err := t.Info(ctx)
+	if err != nil || info == nil {
+		return 0
+	}
+	raw, err := json.Marshal(info)
+	if err != nil {
+		return usage.EstimateBytes(len(info.Name) + len(info.Desc))
+	}
+	return usage.EstimateBytes(len(raw))
+}
+
 func NewWebCmd() *cobra.Command {
 	var port int
 	var host string
@@ -421,6 +441,37 @@ func runWebServer(port int, host string, openBrowser bool) error {
 		return newAg, newRec, nil
 	}
 
+	// breakdownFn estimates how the live agent's context window is partitioned
+	// across system prompt / built-in tools / MCP tools / skills. It reads the
+	// captured assembly variables (systemPrompt, mcpTools, currentCM, skillLoader)
+	// by reference, so project switches and MCP reloads are reflected without any
+	// cache to invalidate. Built-in tools = all tools minus MCP tools.
+	breakdownFn := func() usage.ContextBreakdown {
+		var b usage.ContextBreakdown
+		skillDesc := skillLoader.Descriptions()
+		b.SkillsTokens = usage.Estimate(skillDesc)
+		// Skills are injected into the system prompt, so subtract to avoid
+		// double-counting them in the system-prompt bucket.
+		b.SystemPromptTokens = usage.Estimate(systemPrompt) - b.SkillsTokens
+		if b.SystemPromptTokens < 0 {
+			b.SystemPromptTokens = 0
+		}
+		for _, mt := range mcpTools {
+			b.MCPToolsTokens += estimateToolTokens(ctx, mt)
+		}
+		if currentCM != nil {
+			total := 0
+			for _, at := range buildAllTools(currentCM) {
+				total += estimateToolTokens(ctx, at)
+			}
+			b.SystemToolsTokens = total - b.MCPToolsTokens
+			if b.SystemToolsTokens < 0 {
+				b.SystemToolsTokens = 0
+			}
+		}
-	breakdownFn := func() usage.ContextBreakdown {
-		var b usage.ContextBreakdown
-		skillDesc := skillLoader.Descriptions()
-		b.SkillsTokens = usage.Estimate(skillDesc)
-		// Skills are injected into the system prompt, so subtract to avoid
-		// double-counting them in the system-prompt bucket.
-		b.SystemPromptTokens = usage.Estimate(systemPrompt) - b.SkillsTokens
-		if b.SystemPromptTokens < 0 {
-			b.SystemPromptTokens = 0
-		}
-		for _, mt := range mcpTools {
-			b.MCPToolsTokens += estimateToolTokens(ctx, mt)
-		}
-		if currentCM != nil {
-			total := 0
-			for _, at := range buildAllTools(currentCM) {
-				total += estimateToolTokens(ctx, at)
-			}
-			b.SystemToolsTokens = total - b.MCPToolsTokens
-			if b.SystemToolsTokens < 0 {
-				b.SystemToolsTokens = 0
-			}
-		}
+	breakdownFn := func() usage.ContextBreakdown {
+		var b usage.ContextBreakdown
+		prompt := systemPrompt
+		var toolsForMode []tool.BaseTool
+		if currentCM != nil {
+			if currentPlanMode {
+				prompt = planPrompt
+				toolsForMode = buildPlanTools()
+			} else {
+				toolsForMode = buildAllTools(currentCM)
+			}
+		}
+
+		skillDesc := skillLoader.Descriptions()
+		if !currentPlanMode {
+			b.SkillsTokens = usage.Estimate(skillDesc)
+		}
+
+		// Skills are injected into the system prompt, so subtract to avoid
+		// double-counting them in the system-prompt bucket.
+		b.SystemPromptTokens = usage.Estimate(prompt) - b.SkillsTokens
+		if b.SystemPromptTokens < 0 {
+			b.SystemPromptTokens = 0
+		}
+		if !currentPlanMode {
+			for _, mt := range mcpTools {
+				b.MCPToolsTokens += estimateToolTokens(ctx, mt)
+			}
+		}
+		if len(toolsForMode) > 0 {
+			total := 0
+			for _, at := range toolsForMode {
+				total += estimateToolTokens(ctx, at)
+			}
+			b.SystemToolsTokens = total - b.MCPToolsTokens
+			if b.SystemToolsTokens < 0 {
+				b.SystemToolsTokens = 0
+			}
+		}
+		return b
+	}
-	breakdownFn := func() usage.ContextBreakdown {
-		var b usage.ContextBreakdown
-		skillDesc := skillLoader.Descriptions()
-		b.SkillsTokens = usage.Estimate(skillDesc)
-		// Skills are injected into the system prompt, so subtract to avoid
-		// double-counting them in the system-prompt bucket.
-		b.SystemPromptTokens = usage.Estimate(systemPrompt) - b.SkillsTokens
-		if b.SystemPromptTokens < 0 {
-			b.SystemPromptTokens = 0
-		}
-		for _, mt := range mcpTools {
-			b.MCPToolsTokens += estimateToolTokens(ctx, mt)
-		}
-		if currentCM != nil {
-			total := 0
-			for _, at := range buildAllTools(currentCM) {
-				total += estimateToolTokens(ctx, at)
-			}
-			b.SystemToolsTokens = total - b.MCPToolsTokens
-			if b.SystemToolsTokens < 0 {
-				b.SystemToolsTokens = 0
-			}
-		}
+	breakdownFn := func() usage.ContextBreakdown {
+		var b usage.ContextBreakdown
+		prompt := systemPrompt
+		var toolsForMode []tool.BaseTool
+		if currentCM != nil {
+			if currentPlanMode {
+				prompt = planPrompt
+				toolsForMode = buildPlanTools()
+			} else {
+				toolsForMode = buildAllTools(currentCM)
+			}
+		}
+
+		skillDesc := skillLoader.Descriptions()
+		if !currentPlanMode {
+			b.SkillsTokens = usage.Estimate(skillDesc)
+		}
+
+		// Skills are injected into the system prompt, so subtract to avoid
+		// double-counting them in the system-prompt bucket.
+		b.SystemPromptTokens = usage.Estimate(prompt) - b.SkillsTokens
+		if b.SystemPromptTokens < 0 {
+			b.SystemPromptTokens = 0
+		}
+		if !currentPlanMode {
+			for _, mt := range mcpTools {
+				b.MCPToolsTokens += estimateToolTokens(ctx, mt)
+			}
+		}
+		if len(toolsForMode) > 0 {
+			total := 0
+			for _, at := range toolsForMode {
+				total += estimateToolTokens(ctx, at)
+			}
+			b.SystemToolsTokens = total - b.MCPToolsTokens
+			if b.SystemToolsTokens < 0 {
+				b.SystemToolsTokens = 0
+			}
+		}
+		return b
+	}
+		return b
+	}
+
 	srv := web.NewServer(&web.ServerConfig{
 		Port:               port,
 		Host:               host,
@@ -450,6 +501,7 @@ func runWebServer(port int, host string, openBrowser bool) error {
 		EventHandler:       finalHandler,
 		NeedsSetup:         needsSetup,
 		TokenUsage:         agentTokenUsage,
+		ContextBreakdownFn: breakdownFn,
 	})
 
 	// Set handler for approval routing.

diff --git a/internal/config/config.go b/internal/config/config.go
@@ -449,3 +449,22 @@ func SessionsIndexPath() (string, error) {
 	}
 	return filepath.Join(dir, "session.json"), nil
 }
+
+// UsageDir returns the path to the usage-statistics directory (~/.jcode/usage).
+func UsageDir() (string, error) {
+	home, err := os.UserHomeDir()
+	if err != nil {
+		return "", fmt.Errorf("failed to get home directory: %w", err)
+	}
-	home, err := os.UserHomeDir()
-	if err != nil {
-		return "", fmt.Errorf("failed to get home directory: %w", err)
-	}
+	home, err := os.UserHomeDir()
+	if err != nil {
+		return "", fmt.Errorf("usage_dir: %w", err)
+	}
-	home, err := os.UserHomeDir()
-	if err != nil {
-		return "", fmt.Errorf("failed to get home directory: %w", err)
-	}
+	home, err := os.UserHomeDir()
+	if err != nil {
+		return "", fmt.Errorf("usage_dir: %w", err)
+	}
+	return filepath.Join(home, configDir, "usage"), nil
+}
+
+// UsageEventsPath returns the path to the append-only usage event log
+// (~/.jcode/usage/events.jsonl), one JSON line per recorded agent turn.
+func UsageEventsPath() (string, error) {
+	dir, err := UsageDir()
+	if err != nil {
+		return "", err
+	}
+	return filepath.Join(dir, "events.jsonl"), nil
+}
diff --git a/internal/handler/handler.go b/internal/handler/handler.go
@@ -52,9 +52,28 @@ type AgentEventHandler interface {
 	RequestApproval(ctx context.Context, req ApprovalRequest) (ApprovalResponse, error)
 }
 
-// TokenUsage carries token usage info.
+// TokenUsage carries token usage info to the UI surfaces.
+//
+// TotalTokens is the LAST call's total — i.e. current context-window
+// occupancy, used to drive the context-usage bar. The remaining token counters
+// (Prompt/Completion/Cached/Reasoning/CacheWrite/CallCount) are CUMULATIVE for
+// the run's tracker, and CacheHitRate is the cumulative cached/prompt ratio.
+// CacheSupported is false when the provider never reported any cached tokens,
+// so the UI can show "—" instead of a misleading 0%.
+//
+// NOTE: the field order/types here must stay identical to WebTokenData
+// (internal/handler/web.go) so OnTokenUpdate's direct struct conversion keeps
+// compiling.
 type TokenUsage struct {
 	TotalTokens       int64
+	PromptTokens      int64
+	CompletionTokens  int64
+	CachedTokens      int64
+	ReasoningTokens   int64
+	CacheWriteTokens  int64
+	CallCount         int64
+	CacheHitRate      float64
+	CacheSupported    bool
 	ModelContextLimit int // 0 if unknown
 }
 

diff --git a/internal/handler/web.go b/internal/handler/web.go
@@ -224,10 +224,21 @@ type WebToolResultData struct {
 	ToolCallID    string `json:"tool_call_id,omitempty"`
 }
 
-// WebTokenData carries token usage.
+// WebTokenData carries token usage to the browser. Field order/types MUST match
+// handler.TokenUsage so OnTokenUpdate's WebTokenData(info) conversion compiles.
+// total_tokens is current context occupancy (last call); the rest are
+// cumulative for the session.
 type WebTokenData struct {
-	TotalTokens       int64 `json:"total_tokens"`
-	ModelContextLimit int   `json:"model_context_limit"`
+	TotalTokens       int64   `json:"total_tokens"`
+	PromptTokens      int64   `json:"prompt_tokens"`
+	CompletionTokens  int64   `json:"completion_tokens"`
+	CachedTokens      int64   `json:"cached_tokens"`
+	ReasoningTokens   int64   `json:"reasoning_tokens"`
+	CacheWriteTokens  int64   `json:"cache_write_tokens"`
+	CallCount         int64   `json:"call_count"`
+	CacheHitRate      float64 `json:"cache_hit_rate"`
+	CacheSupported    bool    `json:"cache_supported"`
+	ModelContextLimit int     `json:"model_context_limit"`
 }
 
 // WebSubagentData carries subagent lifecycle events.