Tokenometer — LLM cost calculator, token counter, latency benchmark, and CI cost guardrail for Claude, GPT-4o, Gemini, Mistral, and Cohere. CLI + GitHub Action + VS Code extension + Claude Code skill. Live: https://tokenometer.dev
Created and maintained by Faraazuddin Mohammed · LinkedIn · HackerNoon
Warning
tokenometer.cloud is not affiliated with this project or its maintainer. Do not enter credentials, API keys, or provider tokens there. Official Tokenometer surfaces are this GitHub repository, the npm packages linked above, the VS Code/Open VSX marketplace listings, and https://tokenometer.dev.
Tokenometer answers a simple, expensive question: does it actually cost less to send your prompt as YAML, JSON, XML, or Markdown — across Claude, GPT-4o, Gemini, Mistral, and Cohere — and how fast does each provider actually respond? It started as a $23 question. Today it's the only LLM cost CLI that also tells you latency, ships a PR-blocking GitHub Action, lights up your editor's status bar, and teaches Claude Code agents to think in dollars.
| Tokenometer | tokencost (AgentOps) | tiktoken (OpenAI) | gpt-tokenizer | promptfoo | gpt-token-counter-live (VS Code) | |
|---|---|---|---|---|---|---|
| Multi-provider (Anthropic / OpenAI / Google) | ✓ | ✓ | – | – | ✓ | – |
| Mistral support | ✓ | – | – | – | partial | – |
| Cohere support | ✓ | – | – | – | partial | – |
| Multi-format compare (JSON / YAML / XML / MD / text) | ✓ | – | – | – | – | – |
Empirical mode (real provider countTokens) |
✓ | – | – | – | partial | – |
| Latency (TTFT + tokens/sec, p50/p95) | ✓ | – | – | – | partial | – |
| Vision-token cost (image inputs) | ✓ | – | – | – | – | – |
| Cost (USD), not just tokens | ✓ | ✓ | – | – | partial | – |
| Honest "approximate" flag when offline is a proxy | ✓ | – | – | – | – | – |
| CLI | ✓ | ✓ | – | – | ✓ | – |
| GitHub Action (PR cost-diff guardrail) | ✓ | – | – | – | partial | – |
| Per-file attribution in CI | ✓ | – | – | – | – | – |
| SARIF output (GitHub code scanning) | ✓ | – | – | – | – | – |
| VS Code / Cursor extension | ✓ | – | – | – | – | ✓ |
| Claude Code skill | ✓ | – | – | – | – | – |
Tokenometer is the only tool in this list that combines multi-provider (5 providers, 63 models) + multi-format + empirical mode + latency benchmarking + USD cost + a PR-blocking GitHub Action + an editor extension + a Claude Code skill + an honest approximate-vs-exact flag. tokencost is the closest match for cost-in-USD across providers, but it doesn't compare formats, measure latency, or run as a CI guardrail. tiktoken and gpt-tokenizer are great single-provider primitives — Tokenometer uses gpt-tokenizer under the hood for the offline path. promptfoo is the broadest evaluator overall, but cost is one input among many; it isn't a dedicated cost-guardrail. The VS Code extension is real-time-in-editor only.
claude-opus-4-7realmessages.countTokensis +62% denser (median) than the popularcl100k_baseproxy. If you budget Claude cost fromtiktoken, you under-budget by ~half.claude-sonnet-4-6andclaude-haiku-4-5are within ~17% ofcl100k_base(and identical to each other — same tokenizer family).- Format choice (JSON / YAML / XML / Markdown / text) is a wash — within ~1pp on the median delta. Picking a cheaper model saves 7-12×; reformatting saves ~10%.
gpt-4oempirical (Anthropic's countTokens equivalent for OpenAI: tiktokeno200k_base) matches the offline tokenometer counts on 100/100 cells, exactly. Sanity anchor.
Reproduce: npm install && npm run benchmarks:empirical with ANTHROPIC_API_KEY set. Full sweep is free (countTokens is free).
$ tokenometer ./prompt.md --model claude-opus-4-7 --format json,yaml,markdown
Model Format Tokens USD Approx
────────────────── ────────── ──────── ───────── ──────
claude-opus-4-7 json 1,243 $0.0186 ✓
claude-opus-4-7 yaml 1,189 $0.0178 ✓
claude-opus-4-7 markdown 1,156 $0.0173 ✓
Cheapest: claude-opus-4-7 as markdown ($0.0173)
Priciest: claude-opus-4-7 as json ($0.0186, 1.08x more)
The Approx column shows ✓ when the count is a proxy (Anthropic / Google / Mistral-Tekken / Cohere offline) and is empty when it's an exact match (OpenAI offline, Mistral SentencePiece-family offline, or any provider with --empirical).
Real demo (with empirical mode + GIF) at https://tokenometer.dev.
Cost AND latency in one CLI — the only tool that does both. tiktoken and @anthropic-ai/tokenizer give you a token count for one provider. They don't tell you:
- What the same prompt costs across multiple providers and models (Claude, GPT-4o, Gemini, Mistral, Cohere)
- How fast each provider actually responds (TTFT + tokens/sec, p50/p95/mean) — a real generation, not a synthetic benchmark
- Whether format conversion (YAML ↔ JSON ↔ XML ↔ MD) actually moves the needle
- The empirical cost — what your provider actually charged on a real call, after prompt caching
- Whether a PR introduced a prompt-cost regression
- The vision-token cost when your prompt includes images
Tokenometer is dev-time, multi-provider, multi-format, optionally empirical, latency-aware, vision-aware, and CI-native. And the same core powers the CLI, the GitHub Action, the VS Code / Cursor status bar, and the Claude Code skill — counts, pricing, and tokenizer choices stay identical across surfaces.
One-shot:
npx tokenometer ./prompt.md --model claude-opus-4-7Global:
npm i -g tokenometer
tokenometer ./prompt.md --format yaml,json,xml,markdown,text --model claude-opus-4-7,gpt-4o,mistral-large-latest,command-r-plus-08-2024Stdin works too:
echo "prompt body" | tokenometer - --model claude-sonnet-4-6Run tokenometer --help for the full flag list and the current set of known model ids (63 across 5 providers).
Use docs/ADOPTION.md for copy-paste integration paths:
GitHub Actions cost gates, VS Code/Cursor rollout, MCP clients, LangChain,
Vercel AI SDK, OpenAI SDK, Anthropic SDK, and a PR cost-regression case study.
Canonical tutorial: Add an LLM Prompt Cost Gate to GitHub Actions in 10 Minutes.
tokenometer ./prompt.md --model claude-opus-4-7Prints estimated tokens + USD across each format × the chosen model(s). Default model is claude-opus-4-7 (or auto-detected from *_API_KEY env vars); default formats are all of json,markdown,text,xml,yaml.
ANTHROPIC_API_KEY=… tokenometer ./prompt.md --empirical --max-spend 0.05For each (model × format) cell, calls the provider's exact token-count API:
- Anthropic →
messages.countTokens(free) - Google →
model.countTokens(free) - OpenAI → tiktoken
o200k_base(matches OpenAI's production count exactly, no API call) - Cohere →
POST /v1/tokenize(free, requiresCOHERE_API_KEY) - Mistral → unsupported (no public token-count endpoint); offline
mistral-tokenizer-jsis exact for SentencePiece-family models, approximate (chars/4) for Tekken-family models.
Set GOOGLE_API_KEY (or GEMINI_API_KEY) for Gemini, MISTRAL_API_KEY for Mistral, COHERE_API_KEY for Cohere. --offline forces the offline path even if --empirical is also passed.
- uses: faraa2m/tokenometer/packages/action@v1
with:
paths: prompts/**/*.md,prompts/**/*.json
models: claude-opus-4-7,claude-sonnet-4-6,gpt-4o
formats: json,yaml,markdown
budget: '0.50' # USD; omit to disable the gate
top-n-files: 5 # rows shown in the per-file Δ table; the rest fold into <details>Posts a sticky PR comment with the cost diff vs the base branch, including a per-file Δ table and a collapsible "all files" block. Fails the check when the total Δ exceeds budget. See packages/action/README.md for all inputs and outputs.
ext install faraa2m.tokenometer-vscode
Or install directly from the VS Code Marketplace or Open VSX (Cursor / VSCodium).
Status bar shows model · tokens · USD for the active prompt file, updates on every keystroke (debounced), and turns warning-colored when you exceed tokenometer.warnOnCostAbove. Same @tokenometer/core as the CLI — what you see in the editor matches what CI computes. See packages/vscode/README.md.
cp -R packages/claude-code-skill ~/.claude/skills/tokenometerInstalls the tokenometer-cost-check skill so Claude Code agents can answer "what does this prompt cost?" with a real number — they shell out to npx tokenometer instead of guessing from tiktoken. See packages/claude-code-skill/README.md.
Tokenometer picks a tokenizer per provider and flags the count as approximate (approximate: true in the API result) when the offline path is a proxy:
| Provider | Offline tokenizer | Exactness | Empirical (--empirical) |
|---|---|---|---|
| OpenAI | gpt-tokenizer o200k_base |
exact | same o200k_base (matches OpenAI production count) |
| Anthropic | gpt-tokenizer cl100k_base |
approximate | messages.countTokens (exact, free) |
chars / 4 heuristic |
approximate | model.countTokens (exact, free) |
|
| Mistral | mistral-tokenizer-js (V1/V2/V3) · chars/4 for Tekken family |
exact for SP-family · approximate for Tekken | unsupported (no public token-count endpoint) |
| Cohere | chars / 4 heuristic |
approximate | POST /v1/tokenize (exact, free, requires COHERE_API_KEY) |
Cost = tokens / 1000 × per-1k input rate. Pricing and context windows are sourced from the tokenlens registry, with a small set of local overrides for bleeding-edge models the registry hasn't picked up yet (and the full Cohere catalog, which @tokenlens/models doesn't ship at v1.3.0) — see packages/core/src/rates.ts (RATES_VERSION). Local overrides were last checked against Anthropic and Cohere public pricing on 2026-05-23.
The CLI is multi-surface by design:
--output table(default) — human-readable per-cell table.--output json— emits aTokenometerResultshape ({ files: [{ path, results: [...] }] }); pipe tojq.--output sarif— emits SARIF 2.1.0; drop into GitHub Code Scanning or any SARIF viewer.--by-file— appends a per-file token + USD summary table for multi-file inputs.--image <path>(repeatable) — adds vision-token cost for Claude / GPT-4o / Gemini.--latency— measures real generation latency (TTFT + total ms + tokens/sec, p50/p95/mean overntrials, default 3). Implies--empirical. Supported on Anthropic, OpenAI, Google, Cohere, and Mistral.
npx tokenometer ./prompt.md --output sarif > tokenometer.sarif
npx tokenometer ./prompts/*.md --by-file --output json | jq '.files[].results | map(.inputCost) | add'
ANTHROPIC_API_KEY=… OPENAI_API_KEY=… npx tokenometer ./prompt.md --latency --model claude-opus-4-7,gpt-4oFull flag reference: packages/cli/README.md.
- VS Code / Cursor —
@tokenometer/vscode. Status bar with live token count + USD cost; settings for model, format, and a warn-above-USD threshold;Tokenometer: Switch modelandTokenometer: Show detailscommands. - Claude Code skill —
@tokenometer/claude-code-skill. Drop in~/.claude/skills/tokenometer/SKILL.mdand Claude Code agents will reach fornpx tokenometer …when you ask them anything cost-shaped.
- Code of Conduct
- Contributing guide
- Security policy — uses GitHub Private Vulnerability Reporting
- Changelog
- Discussions
Tokenometer is part of a focused open-source toolkit for LLM cost, tokenization, routing, and prompt optimization.
- llm-tokens-atlas — open benchmark of LLM tokenization calibration across providers.
- Hugging Face dataset — canonical public dataset behind the tokenization atlas.
- promptc — deterministic compiler for cost-aware prompt optimization.
- routerlab — cost-quality routing for LLM APIs with reproducible Pareto frontiers.
- ast-ai-model-router — AST-aware Claude and Codex model router for token-conscious coding agents.
v2.x — production-ready. Shipped across npm (tokenometer, @tokenometer/core, @tokenometer/mcp), VS Code Marketplace + Open VSX (faraa2m.tokenometer-vscode), the repo-hosted GitHub Action (faraa2m/tokenometer/packages/action@v1), and the live playground at tokenometer.dev. See CHANGELOG.md for release notes and the milestones page for what's next.
MIT